Attention Is All You Need

A from-scratch implementation of the original transformer from Vaswani et al., 2017. Encoder-decoder architecture trained on English-to-French translation.

11.5 million parameters. Runs on a laptop. Translates simple sentences with surprising accuracy.

What's in here

model.py - Encoder-decoder transformer (multi-head attention, sinusoidal positional encoding, cross-attention)
tokenizer.py - Word-level tokenizers for source and target languages
data.py - Parallel corpus loader and batching
train.py - Training loop with Noam LR schedule and label smoothing
translate.py - Greedy decoding with interactive mode
download_data.py - Downloads Tatoeba English-French sentence pairs

Quick start

python3 -m venv .venv && source .venv/bin/activate
pip install torch numpy

# Download the data (240k English-French sentence pairs from Tatoeba)
python download_data.py

# Train (uses 50k pairs, ~40 minutes on Apple MPS)
python train.py --epochs 20

# Translate
python translate.py --sentence "The cat is on the table."
python translate.py --interactive

Sample output (epoch 20)

EN: The cat is on the table.     FR: le chat est sur la table.
EN: I love you.                  FR: je t'aime.
EN: She has a beautiful house.   FR: elle a une belle maison.
EN: Where is the train station?  FR: où est la gare?
EN: I don't understand.          FR: je ne comprends pas.
EN: He likes to read books.      FR: il aime lire des livres.
EN: It is raining.               FR: il pleut.
EN: We are happy.                FR: nous sommes heureux.

Model config

	Value
d_model	256
Heads	8
Encoder layers	4
Decoder layers	4
d_ff	512
Source vocab	6,033
Target vocab	9,201
Parameters	11,536,113
Training time (20 epochs, MPS)	~40 min
Best val loss	2.6820 (epoch 20)

Unlike decoder-only models trained on small corpora, this model doesn't overfit aggressively. With 50k sentence pairs and 11.5M parameters, validation loss keeps improving through all 20 epochs.

Blog post

Written up at danieljohnmorris.com/writing/attention-is-all-you-need.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Attention Is All You Need

What's in here

Quick start

Sample output (epoch 20)

Model config

Blog post

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
data.py		data.py
download_data.py		download_data.py
model.py		model.py
tokenizer.py		tokenizer.py
train.py		train.py
translate.py		translate.py

Folders and files

Latest commit

History

Repository files navigation

Attention Is All You Need

What's in here

Quick start

Sample output (epoch 20)

Model config

Blog post

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages