11# TinyGPU 🐉⚡
22
3- [ ![ PyPI version] ( https://img.shields.io/badge/version-1 .0.0-blue.svg )] ( https://pypi.org/project/tinygpu )
3+ [ ![ PyPI version] ( https://img.shields.io/badge/version-2 .0.0-blue.svg )] ( https://pypi.org/project/tinygpu )
44[ ![ Python 3.13] ( https://img.shields.io/badge/Python-3.13-blue.svg )] ( https://www.python.org/downloads/ )
55[ ![ License: MIT] ( https://img.shields.io/badge/license-MIT-green.svg )] ( LICENSE )
66[ ![ CI] ( https://github.com/deaneeth/tinygpu/actions/workflows/ci.yml/badge.svg )] ( https://github.com/deaneeth/tinygpu/actions )
7+ [ ![ Code Style: Black] ( https://img.shields.io/badge/code%20style-black-000000.svg )] ( https://github.com/psf/black )
8+ [ ![ Tests] ( https://img.shields.io/github/actions/workflow/status/deaneeth/tinygpu/ci.yml?label=tests )] ( https://github.com/deaneeth/tinygpu/actions )
79
810TinyGPU is a ** tiny educational GPU simulator** - inspired by [ Tiny8] ( https://github.com/sql-hkr/tiny8 ) , designed to demonstrate how GPUs execute code in parallel. It models a small ** SIMT (Single Instruction, Multiple Threads)** system with per-thread registers, global memory, synchronization barriers, branching, and a minimal GPU-like instruction set.
911
1012> 🎓 * Built for learning and visualization - see how threads, registers, and memory interact across cycles!*
11-
13+
1214| Odd-Even Sort | Reduction |
1315| ---------------| ------------|
14- | ![ Odd-Even Sort] ( outputs/run_odd_even_sort/run_odd_even_sort_20251025-205516.gif ) | ![ Reduction] ( outputs/run_reduce_sum/run_reduce_sum_20251025-210237.gif ) |
16+ | ![ Odd-Even Sort] ( src/outputs/run_odd_even_sort/run_odd_even_sort_20251026-212558.gif ) | ![ Reduction] ( src/outputs/run_reduce_sum/run_reduce_sum_20251026-212712.gif ) |
17+
18+ ---
19+
20+ ## 🚀 What's New in v2.0.0
21+
22+ - ** Enhanced Instruction Set** :
23+ - Added ` SHLD ` and ` SHST ` for robust shared memory operations.
24+ - Improved ` SYNC ` semantics for better thread coordination.
25+ - ** Visualizer Improvements** :
26+ - Export execution as GIFs with enhanced clarity.
27+ - Added support for saving visuals directly from the simulator.
28+ - ** Refactored Core** :
29+ - Simplified step semantics for better extensibility.
30+ - Optimized performance for larger thread counts.
31+ - ** CI/CD Updates** :
32+ - Integrated linting (` ruff ` , ` black ` ) and testing workflows.
33+ - Automated builds and tests on GitHub Actions.
34+ - ** Documentation** :
35+ - Expanded examples and added detailed usage instructions.
1536
1637---
1738
@@ -51,10 +72,11 @@ TinyGPU was built as a **learning-first GPU simulator** - simple enough for begi
5172> 🧭 TinyGPU aims to make GPU learning * intuitive, visual, and interactive* - from classroom demos to self-guided exploration.
5273
5374---
75+
5476## ✨ Highlights
5577
5678- 🧩 ** GPU-like instruction set:**
57- ` SET ` , ` ADD ` , ` MUL ` , ` LD ` , ` ST ` , ` JMP ` , ` BNE ` , ` BEQ ` , ` SYNC ` , ` CSWAP ` .
79+ ` SET ` , ` ADD ` , ` MUL ` , ` LD ` , ` ST ` , ` JMP ` , ` BNE ` , ` BEQ ` , ` SYNC ` , ` CSWAP ` , ` SHLD ` , ` SHST ` .
5880- 🧠 ** Per-thread registers & PCs** - each thread executes the same kernel independently.
5981- 🧱 ** Shared global memory** for inter-thread operations.
6082- 🔄 ** Synchronization barriers** (` SYNC ` ) for parallel coordination.
@@ -69,31 +91,39 @@ TinyGPU was built as a **learning-first GPU simulator** - simple enough for begi
6991
7092## 🖼️ Example Visuals
7193
72- > Located in ` examples/ ` — you can generate these GIFs yourself .
94+ > Located in ` src/outputs/ ` — run the example scripts to generate these GIFs (they're saved under ` src/outputs/<script_name>/ ` ) .
7395
74- | Odd-Even Sort | Reduction |
75- | ---------------| ------------|
76- | ![ Odd-Even Sort] ( outputs/run_odd_even_sort/run_odd_even_sort_20251025-205516.gif ) | ![ Reduction] ( outputs/run_reduce_sum/run_reduce_sum_20251025-210237.gif ) |
96+ | Example | Description | GIF Preview |
97+ | ---------| -------------| -------------|
98+ | Vector Add | Parallel vector addition (A+B -> C) | ![ Vector Add] ( src/outputs/run_vector_add/run_vector_add_20251026-212734.gif ) |
99+ | Block Shared Sum | Per-block shared memory sum example | ![ Block Shared Sum] ( src/outputs/run_block_shared_sum/run_block_shared_sum_20251026-212542.gif ) |
100+ | Odd-Even Sort | GPU-style odd-even transposition sort | ![ Odd-Even Sort] ( src/outputs/run_odd_even_sort/run_odd_even_sort_20251026-212558.gif ) |
101+ | Parallel Reduction | Sum reduction across an array | ![ Reduction] ( src/outputs/run_reduce_sum/run_reduce_sum_20251026-212712.gif ) |
102+ | Sync Test | Synchronization / barrier demonstration | ![ Sync Test] ( src/outputs/run_sync_test/run_sync_test_20251027-000818.gif ) |
103+ | Loop Test | Branching and loop behavior demo | ![ Test Loop] ( src/outputs/run_test_loop/run_test_loop_20251026-212814.gif ) |
104+ | Compare Test | Comparison and branching example | ![ Test CMP] ( src/outputs/run_test_cmp/run_test_cmp_20251026-212823.gif ) |
105+ | Kernel Args Test | Demonstrates passing kernel arguments | ![ Kernel Args] ( src/outputs/run_test_kernel_args/run_test_kernel_args_20251026-212830.gif ) |
77106
78107---
79108
80109## 🚀 Quickstart
81110
82111### Clone and install
112+
83113``` bash
84114git clone https://github.com/deaneeth/tinygpu.git
85115cd tinygpu
86116pip install -e .
87117pip install -r requirements-dev.txt
88- ````
118+ ```
89119
90120### Run an example
91121
92122``` bash
93123python -m examples.run_odd_even_sort
94124```
95125
96- > Produces: ` examples/odd_even_sort .gif` — a visual GPU-style sorting process.
126+ > Produces: ` src/outputs/run_odd_even_sort/run_odd_even_sort_* .gif` — a visual GPU-style sorting process.
97127
98128### Other examples
99129
@@ -108,30 +138,50 @@ python -m examples.run_sync_test
108138
109139## 🧩 Project Layout
110140
111- ```
112- tinygpu/
141+ ``` text
142+ .
143+ ├─ .github/
144+ │ └─ workflows/
145+ │ └─ ci.yml
146+ ├─ docs/
147+ │ └─ index.md
113148├─ examples/
114- │ ├─ vector_add .tgpu
149+ │ ├─ odd_even_sort_tmp .tgpu
115150│ ├─ odd_even_sort.tgpu
116151│ ├─ reduce_sum.tgpu
117- │ ├─ run_vector_add.py
118152│ ├─ run_odd_even_sort.py
119153│ ├─ run_reduce_sum.py
154+ │ ├─ run_sync_test.py
120155│ ├─ run_test_loop.py
121- │ └─ run_sync_test.py
122- │
156+ │ ├─ run_vector_add.py
157+ │ ├─ sync_test.tgpu
158+ │ ├─ test_loop.tgpu
159+ │ └─ vector_add.tgpu
160+ ├─ src/outputs/
161+ │ ├─ run_block_shared_sum/
162+ │ ├─ run_odd_even_sort/
163+ │ ├─ run_reduce_sum/
164+ │ ├─ run_sync_test/
165+ │ ├─ run_test_cmp/
166+ │ ├─ run_test_kernel_args/
167+ │ ├─ run_test_loop/
168+ │ └─ run_vector_add/
123169├─ src/
124170│ └─ tinygpu/
171+ │ ├─ __init__.py
125172│ ├─ assembler.py
126173│ ├─ gpu.py
127174│ ├─ instructions.py
128- │ ├─ visualizer.py
129- │ └─ __ init__ .py
130- │
175+ │ └─ visualizer.py
131176├─ tests/
177+ │ ├─ test_assembler.py
178+ │ ├─ test_gpu_core.py
179+ │ ├─ test_gpu.py
180+ │ └─ test_programs.py
181+ ├─ LICENSE
132182├─ pyproject.toml
133- ├─ requirements-dev.txt
134- └─ README.md
183+ ├─ README.md
184+ └─ requirements-dev.txt
135185```
136186
137187---
@@ -156,6 +206,8 @@ TinyGPU uses a **minimal instruction set** designed for clarity and education -
156206| ` BNE Ra, Rb, target ` | Branch if not equal. | Jump to ` target ` if ` Ra != Rb ` . |
157207| ` SYNC ` | * (no operands)* | Synchronization barrier — all threads must reach this point before continuing. |
158208| ` CSWAP addrA, addrB ` | Compare-and-swap memory values. | If ` mem[addrA] > mem[addrB] ` , swap them. Used for sorting. |
209+ | ` SHLD addr, Rs ` | Load shared memory into register. | ` Rs = shared_mem[addr] ` |
210+ | ` SHST addr, Rs ` | Store register into shared memory. | ` shared_mem[addr] = Rs ` |
159211| ` CMP Rd, Ra, Rb ` * (optional)* | Compare and set flag or register. | Used internally for extended examples (e.g., prefix-scan). |
160212| ` NOP ` * (optional)* | * (no operands)* | No operation; placeholder instruction. |
161213
@@ -267,7 +319,7 @@ MIT - see [LICENSE](LICENSE)
267319
268320## 🌟 Credits & Inspiration
269321
270- ❤️ Built by [ Deaneeth] ( https://github.com/deaneeth )
322+ ❤️ Built by [ Deaneeth] ( https://github.com/deaneeth )
271323
272324> Inspired by the educational design of [ Tiny8 CPU Simulator] ( https://github.com/sql-hkr/tiny8 ) .
273325
0 commit comments