Skip to content

hereandnowai/transformers-simplified

Repository files navigation

HERE AND NOW AI Logo

πŸ€– Transformer Simplified: Master the Architecture of Modern Generative AI

AI is Good 🌟

Learn how the Attention mechanism works and how ChatGPT was built β€” step by step, from scratch.

This project by HERE AND NOW AI is a hands-on tutorial based on the foundational paper "Attention Is All You Need" (Vaswani et al., 2017). Every concept is explained in plain English with analogies, and every Python script is self-contained and runnable.


πŸ“‹ Prerequisites

  • Python 3.8+
  • NumPy (the only dependency!)
  • No GPU required β€” everything runs on CPU in seconds
  • No prior deep learning knowledge needed

⚑ Quick Start

# 1. Clone or navigate to this folder
cd transformers-simplified

# 2. Install dependencies
pip install -r requirements.txt

πŸ” SEO Keywords

Transformers Explained, Self-Attention mechanism, LLM from scratch, Generative AI tutorial, GPT architecture, Neural Networks, AI Education, HERE AND NOW AI.


πŸ—ΊοΈ Learning Roadmap

Follow the chapters in order. Each chapter has:

  • πŸ“– concepts.md β€” Read this first (theory + analogies)
  • 🐍 *.py β€” Then run the code (hands-on demo)

πŸš€ Simplified Learning Pathway

All scripts are designed by HERE AND NOW AI for local execution and maximum educational clarity.

"AI is Good"

START HERE
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Chapter 1: Introduction                                 β”‚
β”‚  πŸ“– concepts.md    β†’ Why Transformers replaced RNNs      β”‚
β”‚  🐍 why_transformers.py β†’ Sequential vs parallel demo    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Chapter 2: Word Embeddings                              β”‚
β”‚  πŸ“– concepts.md    β†’ Turning words into numbers          β”‚
β”‚  🐍 word_embeddings.py β†’ Embeddings & similarity demo    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Chapter 3: Positional Encoding                          β”‚
β”‚  πŸ“– concepts.md    β†’ How the model knows word order      β”‚
β”‚  🐍 positional_encoding.py β†’ Sin/cos wave visualization  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Chapter 4: Self-Attention  ⭐ (THE core concept)        β”‚
β”‚  πŸ“– concepts.md    β†’ Query, Key, Value explained         β”‚
β”‚  🐍 self_attention.py β†’ Attention computed step-by-step  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Chapter 5: Multi-Head Attention                         β”‚
β”‚  πŸ“– concepts.md    β†’ Multiple perspectives in parallel   β”‚
β”‚  🐍 multi_head_attention.py β†’ Multi-head from scratch    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Chapter 6: Transformer Block                            β”‚
β”‚  πŸ“– concepts.md    β†’ LayerNorm, FFN, residual connectionsβ”‚
β”‚  🐍 transformer_block.py β†’ Complete encoder block        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Chapter 7: Full Transformer                             β”‚
β”‚  πŸ“– concepts.md    β†’ Encoder-Decoder + masked attention  β”‚
β”‚  🐍 mini_transformer.py β†’ End-to-end mini Transformer   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Chapter 8: GPT & ChatGPT  πŸ”₯                            β”‚
β”‚  πŸ“– concepts.md    β†’ GPT-1β†’GPT-4, RLHF, AI revolution  β”‚
β”‚  🐍 mini_gpt.py   β†’ Decoder-only text generator         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Chapter 9: Bonus β€” Real-World AI Demos  🎁              β”‚
β”‚  🐍 simple_chatbot.py      β†’ Chat with a local LLM      β”‚
β”‚  🐍 get_word_vector.py     β†’ Fetch word embeddings       β”‚
β”‚  🐍 visualize_embeddings.pyβ†’ 2D embedding visualization  β”‚
β”‚  🐍 word_distance.py       β†’ Cosine similarity & analogy β”‚
β”‚  🐍 kokoro_tts.py          β†’ Text-to-Speech with Kokoro  β”‚
β”‚  🐍 whisper_stt.py         β†’ Speech-to-Text with Whisper β”‚
β”‚  🐍 text_to_image.py       β†’ Image gen with SDXL Turbo   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β–Ό
                  πŸŽ‰ DONE!

πŸ“ Project Structure

transformer_simplified/
β”‚
β”œβ”€β”€ README.md                     ← You are here
β”œβ”€β”€ requirements.txt              ← pip install -r requirements.txt
β”œβ”€β”€ attention_is_all_you_need.pdf ← The original paper (reference)
β”‚
β”œβ”€β”€ 01_introduction/
β”‚   β”œβ”€β”€ concepts.md               ← Why Transformers changed everything
β”‚   └── why_transformers.py       ← Sequential vs parallel processing
β”‚
β”œβ”€β”€ 02_word_embeddings/
β”‚   β”œβ”€β”€ concepts.md               ← Words β†’ vectors, cosine similarity
β”‚   └── word_embeddings.py        ← Build embeddings, word arithmetic
β”‚
β”œβ”€β”€ 03_positional_encoding/
β”‚   β”œβ”€β”€ concepts.md               ← Sin/cos position signals
β”‚   └── positional_encoding.py    ← Compute & visualize encodings
β”‚
β”œβ”€β”€ 04_self_attention/
β”‚   β”œβ”€β”€ concepts.md               ← Q, K, V β€” the core mechanism
β”‚   └── self_attention.py         ← Attention from scratch
β”‚
β”œβ”€β”€ 05_multi_head_attention/
β”‚   β”œβ”€β”€ concepts.md               ← Parallel attention heads
β”‚   └── multi_head_attention.py   ← Multi-head demo
β”‚
β”œβ”€β”€ 06_transformer_block/
β”‚   β”œβ”€β”€ concepts.md               ← Residuals, LayerNorm, FFN
β”‚   └── transformer_block.py      ← Complete encoder block
β”‚
β”œβ”€β”€ 07_full_transformer/
β”‚   β”œβ”€β”€ concepts.md               ← Encoder + Decoder architecture
β”‚   └── mini_transformer.py       ← Masked & cross-attention
β”‚
β”œβ”€β”€ 08_gpt_and_chatgpt/
β”‚   β”œβ”€β”€ concepts.md               ← GPT timeline, RLHF, ChatGPT
β”‚   └── mini_gpt.py               ← Decoder-only text generation
β”‚
β”œβ”€β”€ 09_bonus/
β”‚   β”œβ”€β”€ words.md                  ← Example words for embedding demos
β”‚   β”œβ”€β”€ simple_chatbot.py         ← Interactive chatbot using Ollama
β”‚   β”œβ”€β”€ get_word_vector.py        ← Fetch word embeddings from Ollama
β”‚   β”œβ”€β”€ visualize_embeddings.py   ← PCA visualization of word vectors
β”‚   β”œβ”€β”€ word_distance.py          ← Cosine similarity & analogy tool
β”‚   β”œβ”€β”€ kokoro_tts.py             ← Text-to-Speech with Kokoro ONNX
β”‚   β”œβ”€β”€ whisper_stt.py            ← Speech-to-Text with Whisper
β”‚   └── text_to_image.py          ← Image generation with SDXL Turbo
β”‚
└── attention_is_all_you_need.pdf ← The original paper (reference)

πŸƒ Running All Scripts

# Run each chapter in order
python 01_introduction/why_transformers.py
python 02_word_embeddings/word_embeddings.py
python 03_positional_encoding/positional_encoding.py
python 04_self_attention/self_attention.py
python 05_multi_head_attention/multi_head_attention.py
python 06_transformer_block/transformer_block.py
python 07_full_transformer/mini_transformer.py
python 08_gpt_and_chatgpt/mini_gpt.py

# Bonus demos (require additional dependencies β€” see requirements.txt)
python 09_bonus/simple_chatbot.py
python 09_bonus/get_word_vector.py
python 09_bonus/visualize_embeddings.py
python 09_bonus/word_distance.py
python 09_bonus/kokoro_tts.py
python 09_bonus/whisper_stt.py
python 09_bonus/text_to_image.py

Each script prints a clean, annotated walkthrough explaining what it's computing at every step.


πŸ”‘ Key Concepts Quick Reference

Concept One-Line Explanation
Embedding Convert words to vectors so computers can process them
Positional Encoding Add position info using sin/cos waves
Self-Attention Let every word look at every other word to understand context
Q, K, V Query (what I want), Key (what I offer), Value (my content)
Multi-Head Run several attentions in parallel for richer understanding
LayerNorm Normalize values for stable training
Residual Connection Add input to output so information isn't lost
Feed-Forward Network Two linear layers with ReLU β€” "thinking" time for each word
Masked Attention Prevent the model from seeing future words during generation
Cross-Attention Decoder reads the encoder's output
GPT Decoder-only Transformer trained to predict the next word
RLHF Use human feedback to align the model with human preferences

πŸ“„ Reference

This tutorial is based on:

"Attention Is All You Need" Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin NeurIPS 2017 β€” arXiv:1706.03762

The included attention_is_all_you_need.pdf is the original paper for reference.


🀝 Connect with Us

Stay updated with the latest in AI and machine learning.


πŸ“ License

This project is for educational purposes. Feel free to use, modify, and share.


"AI is Good" β€” HERE AND NOW AI

About

Simplified, standalone Python scripts for transformer models, LLMs, TTS, STT, and Image Generation. Educational and lightweight AI. AI is Good.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages