Tiny Poe LLM

A small decoder-only transformer built from scratch in PyTorch, trained on Edgar Allan Poe's complete works.

4.8 million parameters. Runs on a laptop. Produces text that sounds like gothic horror written by someone having a stroke.

What's in here

scrape.py - Scrapes 131 works (stories + poems) from poemuseum.org
tokenizer.py - Character-level and BPE tokenizers, no libraries
model.py - GPT-style transformer (multi-head attention, pre-norm, weight tying)
train.py - Training loop with AdamW, cosine schedule, checkpointing
generate.py - Load a model and generate text with temperature/top-k sampling
data.py - Corpus loader

Quick start

python3 -m venv .venv && source .venv/bin/activate
pip install torch numpy requests beautifulsoup4

# Scrape the corpus
python scrape.py

# Train (char-level)
python train.py --tokenizer char --epochs 5

# Train (BPE)
python train.py --tokenizer bpe --bpe-vocab-size 512 --epochs 5

# Generate
python generate.py --prompt "Once upon a midnight dreary"
python generate.py --checkpoint checkpoints/bpe_best.pt \
  --tokenizer-path checkpoints/bpe_tokenizer.json \
  --tokenizer-type bpe \
  --prompt "I found myself in a dark chamber"

Sample output

Char-level, epoch 5:

The death of a beautiful woman, through the crowd, through seventy
thousand ways to shall examine the blood-room in the pitcher bere
intended to transmiss of it the very first tenef o'clock, through a
shadow and hard, rumors and his parcel, into the innermost regions
of impetuous apparatus

BPE, epoch 4:

In this stage of my beeting I became aware of a dull, sullen
glow-satisfied with a fashion of great genius, pursues of the kingdom

BPE, epoch 5:

Once upon a midnight dreary upon the lips of the axis. It was then,
fully the musical inclined atmosphere — a small portion of the main
drift to the glittering of the night.

Model config

	Char	BPE
Vocab size	146	512
d_model	256	256
Heads	8	8
Layers	6	6
Params	4,809,216	4,902,912
Training time (5 epochs, MPS)	~7.5h	~3.5h

Both models overfit after epoch 1. Best validation loss is always the first epoch. With 4.8M parameters and 1.9M characters, the model has more capacity than data.

Blog post

Written up at danieljohnmorris.com/writing/building-a-tiny-llm-from-scratch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tiny Poe LLM

What's in here

Quick start

Sample output

Model config

Blog post

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
.gitignore		.gitignore
README.md		README.md
data.py		data.py
generate.py		generate.py
model.py		model.py
scrape.py		scrape.py
tokenizer.py		tokenizer.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Tiny Poe LLM

What's in here

Quick start

Sample output

Model config

Blog post

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages