Skip to content

BenBenyamin/GPT2

Repository files navigation

GPT-2 From Scratch

This repository offers a PyTorch-based implementation of the GPT-2 language model, built entirely from scratch. It includes custom modules for attention mechanisms, transformer blocks, loss computation, optimization, and training routines. The model was trained on 40B tokens from the FineWebEdu-10B Dataset on HuggingFace.

For the model weights and how to load them, please refer here.


Summary

The project was based on the GPT-2 and GPT-3 papers, from which the training methodology—including the cosine learning rate scheduler and key hyperparameters—was adopted. The implemented model follows the GPT-2 Small configuration, as seen below.

Hyperparameter Value
Model Architecture GPT-2 Small
Number of Layers 12
Hidden Size 768
Attention Heads 12
Sequence Length 1024
Batch Size 524288
Total Training Time ~80 hours
GPU 2 x NVIDIA A6000
Final Validation Loss 2.99

Example text generation:

Hello, I'm a language model, I'm an author, I'm just starting a new language, I'm a computer scientist, I've been doing things over the past few years, now I'm beginning a new area of research, I'm doing research, I'm taking a field trip. My goal is to discover the answer to the questions in the context of what happens when you write computers. So that I understand what the problem is has done. Now it's so different from doing the computer science. So the language, we use, is to think, think how?
A computer scientist is someone whose job is to design, model, and implement computer systems. They are also known for their creativity, creativity, critical thinking capabilities. They are excellent in creative problem solving, they are able to see things before they can be solved, their ability to think in a natural, natural way, and their ability to make changes to the world, and their ability to have a lot of imagination

Setup

Prerequisites

  • Python 3.10 or higher
  • CUDA-compatible GPU (highly recommended)

Installation

  1. Clone the repository:

    git clone https://github.com/BenBenyamin/GPT2.git
    cd GPT2
  2. Create a virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt

Project Structure

GPT2/
├── dataset.py                # Dataset loading and preprocessing
├── extra/                    # Supplementary scripts and experiment logs
│   ├── load_gpt2_weights.py  # Load Hugging Face weights into the custom model
│   ├── tokenize_dataset.py   # Tokenization pipeline for raw dataset
│   └── tensorboard/
│       └── runs              # TensorBoard run
├── generation_log.txt        # Sample generated outputs across training
├── loss.py                   # Custom GPT-2 loss function
├── model.py                  # Model architecture definitions
├── optimizer.py              # Cosine scheduler and AdamW setup
├── README.md                 # Project documentation
├── requirements.txt          # Python package dependencies
├── resume.py                 # Checkpoint loading
├── train.py                  # Training loop and validation evaluation
└── utils.py                  # General-purpose helper functions

Training:

DDP

torchrun --nproc_per_node=<number of GPUs> --standalone train.py

to resume (please go over resume.py first):

torchrun --nproc_per_node=<number of GPUs> --standalone resume.py

Single Node

python3 train.py
python3 resume.py

References

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

My implementation GPT2 from scratch using the original GPT2 and GPT3 papers.

Resources

License

Stars

Watchers

Forks

Contributors

Languages