A learning compiler project implemented in C++20 and built atop LLVM 21. It demonstrates an end-to-end pipeline for a small custom language (inspired by LLVM Kaleidoscope) in an object-oriented style.
Reference: https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/index.html
You can run the compiler without installing LLVM locally:
# Pull the image
docker pull ghcr.io/blazejoz/toylang-compiler:latest
# Run on your local file
docker run --rm -v $(pwd):/app ghcr.io/blazejoz/toylang-compiler:latest <your_file>.toy The compiler follows a classical 3-phase architecture:
-
Frontend
- Lexer: hand-written scanner and tokenizer.
- Parser: recursive descent parser constructs an AST.
-
Middle-end (IR Generation)
- Transforms AST to LLVM IR.
- Manages variable scope, symbol tables, and functions.
-
Backend
- JIT Engine (LLVM ORCv2) for immediate execution.
- Object emitter (platform target triple + CPU features) for .o output.
- Linker integration using the system
ccfor executables.
View Project Structure
├── include/ # Headers (AST nodes, Parser, Lexer interfaces)
├── src/
│ ├── frontend/ # Lexer.cpp, Parser.cpp
│ ├── middleend/ # IR_Generator.cpp (AST -> LLVM IR)
│ ├── backend/ # JIT Engine & Object Emitter
│ └── main.cpp # Compiler driver & CLI handling
├── tests/ # Unit tests (Catch2)
├── docs/ # Language specification
└── CMakeLists.txt
- LLVM 21+
- C++20 toolchain
- CMake
From repository root:
cmake -B build
cmake --build buildRun with JIT (default mode):
./build/compiler lang_examples/example.toy -jitGenerate LLVM IR (assembly):
./build/compiler lang_examples/example.toy -SCompile to native binary:
./build/compiler lang_examples/example.toy -o my_app
./my_appif/else/while- integer variable declaration and assignment
- function definitions and calls
- LLVM IR generation (
-S) - JIT execution (
-jit) - native object emission / executable generation
Phase 1: Core Control Flow & Logic
- Function Definitions: Support for multiple arguments and return types.
- AOT Compilation: Native .o emission and linking via clang/cc.
- Standard Flow: if-else branching and while loops.
- For Loops: Implementation of C-style for(init; cond; step) sugar.
- Logical Operators: Short-circuiting && and ||.
Memory & Data Structures
- Arrays: Fixed-size stack arrays with bounds checking.
- Structs (User Defined Types): Implementing GetElementPtr (GEP) for member access.
- Global Variables: Support for state that persists across function calls.
- Strings: Basic char* support and integration with printf.
Middle-end & LLVM Optimizations
- SSA Infrastructure: Proper use of Phi nodes for control flow merging (moving away from "everything is a variable").
- Mem2Reg Pass: Promoting stack-allocated variables to LLVM registers to enable cleaner IR.
- Optimization Pipeline: Integrating PassBuilder to run Constant Folding and Dead Code Elimination (DCE).
- CFG Visualization: Exporting Control Flow Graphs to .dot files for debugging.
Developer Experience & Quality
- Better Error Reporting: Tracking line/column numbers in the Lexer for "Clang-style" error messages.
- Standard Library: A small stdlib.toy for math and basic I/O.
- Dockerized Toolchain: Full AOT pipeline available via a single Docker image.
This repository is intended for learning and experimentation. APIs and implementation details may change frequently.
See LICENSE.
