A minimal introduction to building and training a small-scale Large Language Model (LLM) from scratch.
This project demonstrates the foundational concepts behind LLMs by training a character-level language model on a Shakespeare text dataset. The model learns to generate new text in the style of Shakespearean English.
Based on the work of Andrej Karpathy, this implementation serves as a hands-on tutorial for understanding:
- How neural networks can learn patterns in sequential data
- The basic architecture of a transformer decoder
- The training process for generative text models
This was created as a brief educational introduction to LLMs—covering core concepts like tokenization, attention mechanisms, and text generation—without the complexity of production-scale models.
input.txt: Shakespeare text dataset for traininggpt_dev.ipynb: Development notebook for experimentationgpt.py: Transformer-based model implementationbigram.py: Basic bigram language model implementation
Install required packages:
pip install torchThis is a simplified, educational implementation meant to illustrate LLM fundamentals. Real-world LLMs are significantly larger, more complex, and trained on vastly more data.