Minimal decoder-only seq2seq pipeline with proper causal masking, teacher forcing, Ignite training loop, and checkpointed inference
-
Updated
Feb 23, 2026 - Python
Minimal decoder-only seq2seq pipeline with proper causal masking, teacher forcing, Ignite training loop, and checkpointed inference
From paper to code: a rigorous Transformer implementation in TensorFlow 2 — real WMT14 data, Moses tokenizer, and causal masking done right.
Fully vectorized Transformer decoder implemented from scratch in NumPy with causal masking, autoregressive training, and empirical O(n²) complexity analysis.
Implement a decoder-only Transformer in PyTorch to reverse character sequences using causal masking and cross-entropy loss with Ignite training support
Add a description, image, and links to the causal-masking topic page so that developers can more easily learn about it.
To associate your repository with the causal-masking topic, visit your repo's landing page and select "manage topics."