Learn how the Attention mechanism works and how ChatGPT was built β step by step, from scratch.
This project by HERE AND NOW AI is a hands-on tutorial based on the foundational paper "Attention Is All You Need" (Vaswani et al., 2017). Every concept is explained in plain English with analogies, and every Python script is self-contained and runnable.
- Python 3.8+
- NumPy (the only dependency!)
- No GPU required β everything runs on CPU in seconds
- No prior deep learning knowledge needed
# 1. Clone or navigate to this folder
cd transformers-simplified
# 2. Install dependencies
pip install -r requirements.txtTransformers Explained, Self-Attention mechanism, LLM from scratch, Generative AI tutorial, GPT architecture, Neural Networks, AI Education, HERE AND NOW AI.
Follow the chapters in order. Each chapter has:
- π
concepts.mdβ Read this first (theory + analogies) - π
*.pyβ Then run the code (hands-on demo)
All scripts are designed by HERE AND NOW AI for local execution and maximum educational clarity.
"AI is Good"
START HERE
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Chapter 1: Introduction β
β π concepts.md β Why Transformers replaced RNNs β
β π why_transformers.py β Sequential vs parallel demo β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Chapter 2: Word Embeddings β
β π concepts.md β Turning words into numbers β
β π word_embeddings.py β Embeddings & similarity demo β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Chapter 3: Positional Encoding β
β π concepts.md β How the model knows word order β
β π positional_encoding.py β Sin/cos wave visualization β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Chapter 4: Self-Attention β (THE core concept) β
β π concepts.md β Query, Key, Value explained β
β π self_attention.py β Attention computed step-by-step β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Chapter 5: Multi-Head Attention β
β π concepts.md β Multiple perspectives in parallel β
β π multi_head_attention.py β Multi-head from scratch β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Chapter 6: Transformer Block β
β π concepts.md β LayerNorm, FFN, residual connectionsβ
β π transformer_block.py β Complete encoder block β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Chapter 7: Full Transformer β
β π concepts.md β Encoder-Decoder + masked attention β
β π mini_transformer.py β End-to-end mini Transformer β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Chapter 8: GPT & ChatGPT π₯ β
β π concepts.md β GPT-1βGPT-4, RLHF, AI revolution β
β π mini_gpt.py β Decoder-only text generator β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Chapter 9: Bonus β Real-World AI Demos π β
β π simple_chatbot.py β Chat with a local LLM β
β π get_word_vector.py β Fetch word embeddings β
β π visualize_embeddings.pyβ 2D embedding visualization β
β π word_distance.py β Cosine similarity & analogy β
β π kokoro_tts.py β Text-to-Speech with Kokoro β
β π whisper_stt.py β Speech-to-Text with Whisper β
β π text_to_image.py β Image gen with SDXL Turbo β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
π DONE!
transformer_simplified/
β
βββ README.md β You are here
βββ requirements.txt β pip install -r requirements.txt
βββ attention_is_all_you_need.pdf β The original paper (reference)
β
βββ 01_introduction/
β βββ concepts.md β Why Transformers changed everything
β βββ why_transformers.py β Sequential vs parallel processing
β
βββ 02_word_embeddings/
β βββ concepts.md β Words β vectors, cosine similarity
β βββ word_embeddings.py β Build embeddings, word arithmetic
β
βββ 03_positional_encoding/
β βββ concepts.md β Sin/cos position signals
β βββ positional_encoding.py β Compute & visualize encodings
β
βββ 04_self_attention/
β βββ concepts.md β Q, K, V β the core mechanism
β βββ self_attention.py β Attention from scratch
β
βββ 05_multi_head_attention/
β βββ concepts.md β Parallel attention heads
β βββ multi_head_attention.py β Multi-head demo
β
βββ 06_transformer_block/
β βββ concepts.md β Residuals, LayerNorm, FFN
β βββ transformer_block.py β Complete encoder block
β
βββ 07_full_transformer/
β βββ concepts.md β Encoder + Decoder architecture
β βββ mini_transformer.py β Masked & cross-attention
β
βββ 08_gpt_and_chatgpt/
β βββ concepts.md β GPT timeline, RLHF, ChatGPT
β βββ mini_gpt.py β Decoder-only text generation
β
βββ 09_bonus/
β βββ words.md β Example words for embedding demos
β βββ simple_chatbot.py β Interactive chatbot using Ollama
β βββ get_word_vector.py β Fetch word embeddings from Ollama
β βββ visualize_embeddings.py β PCA visualization of word vectors
β βββ word_distance.py β Cosine similarity & analogy tool
β βββ kokoro_tts.py β Text-to-Speech with Kokoro ONNX
β βββ whisper_stt.py β Speech-to-Text with Whisper
β βββ text_to_image.py β Image generation with SDXL Turbo
β
βββ attention_is_all_you_need.pdf β The original paper (reference)
# Run each chapter in order
python 01_introduction/why_transformers.py
python 02_word_embeddings/word_embeddings.py
python 03_positional_encoding/positional_encoding.py
python 04_self_attention/self_attention.py
python 05_multi_head_attention/multi_head_attention.py
python 06_transformer_block/transformer_block.py
python 07_full_transformer/mini_transformer.py
python 08_gpt_and_chatgpt/mini_gpt.py
# Bonus demos (require additional dependencies β see requirements.txt)
python 09_bonus/simple_chatbot.py
python 09_bonus/get_word_vector.py
python 09_bonus/visualize_embeddings.py
python 09_bonus/word_distance.py
python 09_bonus/kokoro_tts.py
python 09_bonus/whisper_stt.py
python 09_bonus/text_to_image.pyEach script prints a clean, annotated walkthrough explaining what it's computing at every step.
| Concept | One-Line Explanation |
|---|---|
| Embedding | Convert words to vectors so computers can process them |
| Positional Encoding | Add position info using sin/cos waves |
| Self-Attention | Let every word look at every other word to understand context |
| Q, K, V | Query (what I want), Key (what I offer), Value (my content) |
| Multi-Head | Run several attentions in parallel for richer understanding |
| LayerNorm | Normalize values for stable training |
| Residual Connection | Add input to output so information isn't lost |
| Feed-Forward Network | Two linear layers with ReLU β "thinking" time for each word |
| Masked Attention | Prevent the model from seeing future words during generation |
| Cross-Attention | Decoder reads the encoder's output |
| GPT | Decoder-only Transformer trained to predict the next word |
| RLHF | Use human feedback to align the model with human preferences |
This tutorial is based on:
"Attention Is All You Need" Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Εukasz Kaiser, Illia Polosukhin NeurIPS 2017 β arXiv:1706.03762
The included attention_is_all_you_need.pdf is the original paper for reference.
Stay updated with the latest in AI and machine learning.
- Website: hereandnowai.com
- Email: info@hereandnowai.com
- Phone: +91 996 296 1000
- Socials:
This project is for educational purposes. Feel free to use, modify, and share.
"AI is Good" β HERE AND NOW AI
