Implemented GPT from scratch
python cuda pytorch lora masking peft cross-entropy-loss multi-head-attention gelu adamw-optimizer temperature-scaling bpe-tokenizer standford-alpaca
-
Updated
Oct 4, 2025 - Jupyter Notebook