Skip to content

danieljohnmorris/tiny-poe-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tiny Poe LLM

A small decoder-only transformer built from scratch in PyTorch, trained on Edgar Allan Poe's complete works.

4.8 million parameters. Runs on a laptop. Produces text that sounds like gothic horror written by someone having a stroke.

What's in here

  • scrape.py - Scrapes 131 works (stories + poems) from poemuseum.org
  • tokenizer.py - Character-level and BPE tokenizers, no libraries
  • model.py - GPT-style transformer (multi-head attention, pre-norm, weight tying)
  • train.py - Training loop with AdamW, cosine schedule, checkpointing
  • generate.py - Load a model and generate text with temperature/top-k sampling
  • data.py - Corpus loader

Quick start

python3 -m venv .venv && source .venv/bin/activate
pip install torch numpy requests beautifulsoup4

# Scrape the corpus
python scrape.py

# Train (char-level)
python train.py --tokenizer char --epochs 5

# Train (BPE)
python train.py --tokenizer bpe --bpe-vocab-size 512 --epochs 5

# Generate
python generate.py --prompt "Once upon a midnight dreary"
python generate.py --checkpoint checkpoints/bpe_best.pt \
  --tokenizer-path checkpoints/bpe_tokenizer.json \
  --tokenizer-type bpe \
  --prompt "I found myself in a dark chamber"

Sample output

Char-level, epoch 5:

The death of a beautiful woman, through the crowd, through seventy
thousand ways to shall examine the blood-room in the pitcher bere
intended to transmiss of it the very first tenef o'clock, through a
shadow and hard, rumors and his parcel, into the innermost regions
of impetuous apparatus

BPE, epoch 4:

In this stage of my beeting I became aware of a dull, sullen
glow-satisfied with a fashion of great genius, pursues of the kingdom

BPE, epoch 5:

Once upon a midnight dreary upon the lips of the axis. It was then,
fully the musical inclined atmosphere — a small portion of the main
drift to the glittering of the night.

Model config

Char BPE
Vocab size 146 512
d_model 256 256
Heads 8 8
Layers 6 6
Params 4,809,216 4,902,912
Training time (5 epochs, MPS) ~7.5h ~3.5h

Both models overfit after epoch 1. Best validation loss is always the first epoch. With 4.8M parameters and 1.9M characters, the model has more capacity than data.

Blog post

Written up at danieljohnmorris.com/writing/building-a-tiny-llm-from-scratch.

About

A small decoder-only transformer built from scratch in PyTorch, trained on Edgar Allan Poe

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages