Skip to content

Latest commit

 

History

History
262 lines (217 loc) · 13 KB

File metadata and controls

262 lines (217 loc) · 13 KB

HERE AND NOW AI Logo

🤖 Transformer Simplified: Master the Architecture of Modern Generative AI

AI is Good 🌟

Learn how the Attention mechanism works and how ChatGPT was built — step by step, from scratch.

This project by HERE AND NOW AI is a hands-on tutorial based on the foundational paper "Attention Is All You Need" (Vaswani et al., 2017). Every concept is explained in plain English with analogies, and every Python script is self-contained and runnable.


📋 Prerequisites

  • Python 3.8+
  • NumPy (the only dependency!)
  • No GPU required — everything runs on CPU in seconds
  • No prior deep learning knowledge needed

⚡ Quick Start

# 1. Clone or navigate to this folder
cd transformers-simplified

# 2. Install dependencies
pip install -r requirements.txt

🔍 SEO Keywords

Transformers Explained, Self-Attention mechanism, LLM from scratch, Generative AI tutorial, GPT architecture, Neural Networks, AI Education, HERE AND NOW AI.


🗺️ Learning Roadmap

Follow the chapters in order. Each chapter has:

  • 📖 concepts.md — Read this first (theory + analogies)
  • 🐍 *.py — Then run the code (hands-on demo)

🚀 Simplified Learning Pathway

All scripts are designed by HERE AND NOW AI for local execution and maximum educational clarity.

"AI is Good"

START HERE
    │
    ▼
┌──────────────────────────────────────────────────────────┐
│  Chapter 1: Introduction                                 │
│  📖 concepts.md    → Why Transformers replaced RNNs      │
│  🐍 why_transformers.py → Sequential vs parallel demo    │
└──────────────────────┬───────────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────────┐
│  Chapter 2: Word Embeddings                              │
│  📖 concepts.md    → Turning words into numbers          │
│  🐍 word_embeddings.py → Embeddings & similarity demo    │
└──────────────────────┬───────────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────────┐
│  Chapter 3: Positional Encoding                          │
│  📖 concepts.md    → How the model knows word order      │
│  🐍 positional_encoding.py → Sin/cos wave visualization  │
└──────────────────────┬───────────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────────┐
│  Chapter 4: Self-Attention  ⭐ (THE core concept)        │
│  📖 concepts.md    → Query, Key, Value explained         │
│  🐍 self_attention.py → Attention computed step-by-step  │
└──────────────────────┬───────────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────────┐
│  Chapter 5: Multi-Head Attention                         │
│  📖 concepts.md    → Multiple perspectives in parallel   │
│  🐍 multi_head_attention.py → Multi-head from scratch    │
└──────────────────────┬───────────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────────┐
│  Chapter 6: Transformer Block                            │
│  📖 concepts.md    → LayerNorm, FFN, residual connections│
│  🐍 transformer_block.py → Complete encoder block        │
└──────────────────────┬───────────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────────┐
│  Chapter 7: Full Transformer                             │
│  📖 concepts.md    → Encoder-Decoder + masked attention  │
│  🐍 mini_transformer.py → End-to-end mini Transformer   │
└──────────────────────┬───────────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────────┐
│  Chapter 8: GPT & ChatGPT  🔥                            │
│  📖 concepts.md    → GPT-1→GPT-4, RLHF, AI revolution  │
│  🐍 mini_gpt.py   → Decoder-only text generator         │
└──────────────────────┬───────────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────────┐
│  Chapter 9: Bonus — Real-World AI Demos  🎁              │
│  🐍 simple_chatbot.py      → Chat with a local LLM      │
│  🐍 get_word_vector.py     → Fetch word embeddings       │
│  🐍 visualize_embeddings.py→ 2D embedding visualization  │
│  🐍 word_distance.py       → Cosine similarity & analogy │
│  🐍 kokoro_tts.py          → Text-to-Speech with Kokoro  │
│  🐍 whisper_stt.py         → Speech-to-Text with Whisper │
│  🐍 text_to_image.py       → Image gen with SDXL Turbo   │
└──────────────────────────────────────────────────────────┘
                       ▼
                  🎉 DONE!

📁 Project Structure

transformer_simplified/
│
├── README.md                     ← You are here
├── requirements.txt              ← pip install -r requirements.txt
├── attention_is_all_you_need.pdf ← The original paper (reference)
│
├── 01_introduction/
│   ├── concepts.md               ← Why Transformers changed everything
│   └── why_transformers.py       ← Sequential vs parallel processing
│
├── 02_word_embeddings/
│   ├── concepts.md               ← Words → vectors, cosine similarity
│   └── word_embeddings.py        ← Build embeddings, word arithmetic
│
├── 03_positional_encoding/
│   ├── concepts.md               ← Sin/cos position signals
│   └── positional_encoding.py    ← Compute & visualize encodings
│
├── 04_self_attention/
│   ├── concepts.md               ← Q, K, V — the core mechanism
│   └── self_attention.py         ← Attention from scratch
│
├── 05_multi_head_attention/
│   ├── concepts.md               ← Parallel attention heads
│   └── multi_head_attention.py   ← Multi-head demo
│
├── 06_transformer_block/
│   ├── concepts.md               ← Residuals, LayerNorm, FFN
│   └── transformer_block.py      ← Complete encoder block
│
├── 07_full_transformer/
│   ├── concepts.md               ← Encoder + Decoder architecture
│   └── mini_transformer.py       ← Masked & cross-attention
│
├── 08_gpt_and_chatgpt/
│   ├── concepts.md               ← GPT timeline, RLHF, ChatGPT
│   └── mini_gpt.py               ← Decoder-only text generation
│
├── 09_bonus/
│   ├── words.md                  ← Example words for embedding demos
│   ├── simple_chatbot.py         ← Interactive chatbot using Ollama
│   ├── get_word_vector.py        ← Fetch word embeddings from Ollama
│   ├── visualize_embeddings.py   ← PCA visualization of word vectors
│   ├── word_distance.py          ← Cosine similarity & analogy tool
│   ├── kokoro_tts.py             ← Text-to-Speech with Kokoro ONNX
│   ├── whisper_stt.py            ← Speech-to-Text with Whisper
│   └── text_to_image.py          ← Image generation with SDXL Turbo
│
└── attention_is_all_you_need.pdf ← The original paper (reference)

🏃 Running All Scripts

# Run each chapter in order
python 01_introduction/why_transformers.py
python 02_word_embeddings/word_embeddings.py
python 03_positional_encoding/positional_encoding.py
python 04_self_attention/self_attention.py
python 05_multi_head_attention/multi_head_attention.py
python 06_transformer_block/transformer_block.py
python 07_full_transformer/mini_transformer.py
python 08_gpt_and_chatgpt/mini_gpt.py

# Bonus demos (require additional dependencies — see requirements.txt)
python 09_bonus/simple_chatbot.py
python 09_bonus/get_word_vector.py
python 09_bonus/visualize_embeddings.py
python 09_bonus/word_distance.py
python 09_bonus/kokoro_tts.py
python 09_bonus/whisper_stt.py
python 09_bonus/text_to_image.py

Each script prints a clean, annotated walkthrough explaining what it's computing at every step.


🔑 Key Concepts Quick Reference

Concept One-Line Explanation
Embedding Convert words to vectors so computers can process them
Positional Encoding Add position info using sin/cos waves
Self-Attention Let every word look at every other word to understand context
Q, K, V Query (what I want), Key (what I offer), Value (my content)
Multi-Head Run several attentions in parallel for richer understanding
LayerNorm Normalize values for stable training
Residual Connection Add input to output so information isn't lost
Feed-Forward Network Two linear layers with ReLU — "thinking" time for each word
Masked Attention Prevent the model from seeing future words during generation
Cross-Attention Decoder reads the encoder's output
GPT Decoder-only Transformer trained to predict the next word
RLHF Use human feedback to align the model with human preferences

📄 Reference

This tutorial is based on:

"Attention Is All You Need" Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin NeurIPS 2017arXiv:1706.03762

The included attention_is_all_you_need.pdf is the original paper for reference.


🤝 Connect with Us

Stay updated with the latest in AI and machine learning.


📝 License

This project is for educational purposes. Feel free to use, modify, and share.


"AI is Good"HERE AND NOW AI