A production-clean implementation of a scalar-valued automatic differentiation engine built completely from scratch. This repository serves as a personal deep-learning playground demonstrating foundational mastery of backpropagation, computational graphs, and neural network mechanics.
Inspired by Andrej Karpathy's Zero to Hero curriculum, this project bypasses high-level abstractions to construct the core mathematical building blocks of deep learning from the ground up.
- Custom Autograd Engine: Implements a scalar-valued
Valueclass that dynamically constructs a Directed Acyclic Graph (DAG) during the forward pass. - Automated Backpropagation: Executes recursive reverse-mode automatic differentiation across the computational graph using custom-mapped chain rule operations (
_backward). - Neural Network Module: Builds fundamental components including single
Neuronnodes, fully connectedLayermatrices, and Multi-Layer Perceptrons (MLPs). - Verification: Cross-validates custom gradient outputs against PyTorchβs autograd engine to ensure absolute numerical precision.
The repository features a streamlined, root-level structural layout for fast discovery and direct execution:
*.ipynbβ Interactive Jupyter Notebooks detailing step-by-step mathematical proofs, DAG graphs, and optimization loops..gitignoreβ Clean baseline tracking configured to safely omit workspace artifacts like.ipynb_checkpoints/.
The codebase runs on Python 3.x and relies on a focused stack of mathematical and visualization libraries:
torchβ Used strictly as a ground-truth baseline to test and verify custom gradient calculations.graphvizβ Used to generate and render visual representations of the underlying computation graphs.numpyβ For vectorized evaluation and data structural operations.matplotlibβ For tracking and plotting training loss convergence curves.
-
Clone the repository:
git clone https://github.com cd inner -
Install the required dependencies:
pip install torch graphviz numpy matplotlib
- Forward Pass Graph Construction: Stacking basic algebraic operations (
+,*,**,relu,tanh) while tracking parental node linkages. - The Chain Rule in Code: Storing a local
_backwardfunction for each operation and calling it via topological sorting to guarantee correct gradient ordering. - Optimization Loop: Initializing a mini-MLP, computing loss (such as MSE or Max-Margin loss), zeroing out old gradients, and executing manual Stochastic Gradient Descent (SGD) step updates.
- Andrej Karpathy: For the exceptional
microgradvideo lecture series and educational framework.