GPT-2 From Scratch

This repository offers a PyTorch-based implementation of the GPT-2 language model, built entirely from scratch. It includes custom modules for attention mechanisms, transformer blocks, loss computation, optimization, and training routines. The model was trained on 40B tokens from the FineWebEdu-10B Dataset on HuggingFace.

For the model weights and how to load them, please refer here.

Summary

The project was based on the GPT-2 and GPT-3 papers, from which the training methodology—including the cosine learning rate scheduler and key hyperparameters—was adopted. The implemented model follows the GPT-2 Small configuration, as seen below.

Hyperparameter	Value
Model Architecture	GPT-2 Small
Number of Layers	12
Hidden Size	768
Attention Heads	12
Sequence Length	1024
Batch Size	524288
Total Training Time	~80 hours
GPU	2 x NVIDIA A6000
Final Validation Loss	2.99

Example text generation:

Hello, I'm a language model, I'm an author, I'm just starting a new language, I'm a computer scientist, I've been doing things over the past few years, now I'm beginning a new area of research, I'm doing research, I'm taking a field trip. My goal is to discover the answer to the questions in the context of what happens when you write computers. So that I understand what the problem is has done. Now it's so different from doing the computer science. So the language, we use, is to think, think how?
A computer scientist is someone whose job is to design, model, and implement computer systems. They are also known for their creativity, creativity, critical thinking capabilities. They are excellent in creative problem solving, they are able to see things before they can be solved, their ability to think in a natural, natural way, and their ability to make changes to the world, and their ability to have a lot of imagination

Setup

Prerequisites

Python 3.10 or higher
CUDA-compatible GPU (highly recommended)

Installation

Clone the repository:

git clone https://github.com/BenBenyamin/GPT2.git
cd GPT2

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Project Structure

GPT2/
├── dataset.py                # Dataset loading and preprocessing
├── extra/                    # Supplementary scripts and experiment logs
│   ├── load_gpt2_weights.py  # Load Hugging Face weights into the custom model
│   ├── tokenize_dataset.py   # Tokenization pipeline for raw dataset
│   └── tensorboard/
│       └── runs              # TensorBoard run
├── generation_log.txt        # Sample generated outputs across training
├── loss.py                   # Custom GPT-2 loss function
├── model.py                  # Model architecture definitions
├── optimizer.py              # Cosine scheduler and AdamW setup
├── README.md                 # Project documentation
├── requirements.txt          # Python package dependencies
├── resume.py                 # Checkpoint loading
├── train.py                  # Training loop and validation evaluation
└── utils.py                  # General-purpose helper functions

Training:

DDP

torchrun --nproc_per_node=<number of GPUs> --standalone train.py

to resume (please go over resume.py first):

torchrun --nproc_per_node=<number of GPUs> --standalone resume.py

Single Node

python3 train.py

python3 resume.py

References

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT-2 From Scratch

Summary

Setup

Prerequisites

Installation

Project Structure

DDP

Single Node

References

License

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
extra		extra
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
generation_log.txt		generation_log.txt
loss.py		loss.py
model.py		model.py
optimizer.py		optimizer.py
requirements.txt		requirements.txt
resume.py		resume.py
train.py		train.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

GPT-2 From Scratch

Summary

Setup

Prerequisites

Installation

Project Structure

DDP

Single Node

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages