A compiler for a subset of ML (Caml Light) targeting the ZINC Abstract Machine, with three runtime implementations: the original Caml Light runtime (C), a verified Haskell reference model, and a high-performance Rust runtime.
This project implements:
- Compiler (
zam-compiler): Translates ML source code into Caml Light-compatible bytecode (Haskell) - Haskell Runtime Model (
zam-runtime-model): Clean-room ZAM interpreter for verification and differential testing - Rust Runtime (
zam-runtime-rust): High-performance, memory-safe ZAM interpreter - C Runtime (
zam-runtime-c): Original Caml Light 0.75 bytecode interpreter (reference)
Features:
- Code Generation: Emits Caml Light-compatible bytecode
- Verification: Differential testing across all three runtimes
This project is built on the foundation of the ZINC Abstract Machine (ZINC Is Not Caml), described in Xavier Leroy's seminal paper. For a comprehensive introduction to the ZINC ideology, design principles, and execution model, see:
📄 docs/zinc.md - The ZINC Abstract Machine: A Portable Runtime System for ML
This project connects a modern Haskell bytecode compiler to the original Caml Light 0.75 (CL75) runtime, breathing new life into this historically significant piece of functional programming heritage.
Why This Matters:
- Educational Platform: Provides a complete, working implementation of the ZINC abstract machine for students and researchers
- Research Vehicle: Creates a new platform for experiments on the classic ZAM architecture
- Historical Preservation: Keeps the CL75 runtime accessible and usable with modern tooling
What We Built:
- A complete ML-to-bytecode compiler in Haskell targeting the original CL75 runtime
- Full compatibility with the vintage Caml Light 0.75 bytecode format
- Support for arithmetic, conditionals, let-bindings, recursive functions, and C primitive calls
To ensure our compiler generates correct bytecode, we built a complete Haskell implementation of the ZAM bytecode interpreter and verified it through differential testing against the original C runtime.
The Challenge: The vintage Caml Light 0.75 runtime occasionally crashes with recursive functions (segfaults), making it hard to determine if crashes are due to our compiler generating wrong bytecode or the runtime's memory issues.
The Solution: We implemented a clean-room Haskell interpreter that follows the same ZAM semantics but without the C runtime's memory management issues. By running identical bytecode through both interpreters and comparing execution traces, we can:
- Verify our compiler generates correct bytecode
- Find bugs in our understanding of ZAM semantics
- Distinguish compiler bugs from runtime stability issues
Result: Successfully identified and fixed 5+ bugs, achieving 100% trace match on all test cases including recursive factorial. The Haskell runtime model is now a reliable reference implementation for ZAM semantics.
We implemented a ground-up Rust runtime (zamrun) that combines C-level performance with Haskell-level safety guarantees.
Performance: The Rust runtime is ~114× faster than the Haskell model and ~1.09× faster than the C runtime on factorial benchmarks, while maintaining 100% reliability (the C runtime crashes ~26% of the time on recursive programs).
Safety: Rust's ownership system and borrow checker eliminate the memory safety bugs that plague the vintage C runtime — no segfaults, no use-after-free, no buffer overflows.
Verification: The Rust runtime passes the same differential tests as the Haskell model, producing identical results on all test files.
# Build all Haskell packages (compiler + runtime model)
cabal build all
# Build Rust runtime
cargo build --workspace --release
# Build C runtime (optional)
make -C zam-runtime-c# Run built-in test examples (default: C runtime)
cabal run zam
# Compile and run an ML source file (default: C runtime)
cabal run zam data/factorial.ml
# Run with the Rust runtime (recommended — fast and reliable)
cabal run zam -- --runtime rust data/factorial.ml
# Run with the Haskell model (handwritten, full execution trace)
cabal run zam -- --runtime haskell data/factorial.ml
# Run with the Agda-extracted verified runtime
# Note: The Agda runtime currently supports a subset of instructions
cabal run zam -- --runtime haskell --agda data/factorial.ml
# Run built-in examples with Rust runtime
cabal run zam -- --runtime rustNote: The C runtime (
--runtime c, the default) has known stability issues with recursive programs. Use--runtime rustfor reliable execution.
# Verify compiler correctness by comparing all three runtimes
test-suite/run-diff-test.sh data/factorial.ml.byt# Haskell runtime model unit tests
cabal test zam-runtime-model:runtime-tests
# Rust runtime tests
cargo test --workspace- Values: Integer constants, variables, float values (via
float_of_int) - Arithmetic:
+,-,*,/with proper precedence - Float arithmetic:
+.,-.,*.,/.for floating-point operations - Comparisons:
<,<=,>,>=,= - Conditionals:
if...then...else - Let bindings:
let x = expr in expr - Recursive functions:
let rec f x = expr in expr - Multi-parameter functions:
let add a b = fun a b -> a + band recursive multi-parameter functions - Anonymous functions:
fun x -> expr - Function application:
f arg - C primitive calls:
ccall name arity args...(e.g.,ccall sys_time 0,ccall output_int 2 stdout 42) - Math primitives:
ccall exp 1,ccall log 1,ccall sqrt 1,ccall pow 2,ccall sin 1,ccall cos 1,ccall tan 1,ccall log10 1,ccall abs_float 1 - Float conversions:
ccall float_of_int 1,ccall int_of_float 1 - Array primitives:
ccall make_array 2 size init,ccall array_get 2 arr idx,ccall array_set 3 arr idx val,ccall array_length 1 arr
let rec fact n =
if n <= 0 then 1
else n * fact (n - 1)
in
fact 5let x = 1 in x + 2let add x y = x + y in add 3 4let rec gaussian x =
let x_float = ccall float_of_int 1 x in
let neg_one = ccall float_of_int 1 (0 - 1) in
let two = ccall float_of_int 1 2 in
let x2 = x_float *. x_float in
let neg_x2 = neg_one *. x2 in
let neg_x2_div_2 = neg_x2 /. two in
ccall exp 1 neg_x2_div_2
in
let rec fill_range arr idx size =
if idx >= size then arr
else
let _ = ccall array_set 3 arr idx idx in
fill_range arr (idx + 1) size
in
let rec apply_gaussian src dst idx size =
if idx >= size then dst
else
let x = ccall array_get 2 src idx in
let y = gaussian x in
let _ = ccall array_set 3 dst idx y in
apply_gaussian src dst (idx + 1) size
in
let rec sum_array arr idx size acc =
if idx >= size then acc
else
let x = ccall array_get 2 arr idx in
let x_int = ccall int_of_float 1 x in
sum_array arr (idx + 1) size (acc + x_int)
in
let n = 5 in
let src = ccall make_array 2 (n + 1) (ccall float_of_int 1 0) in
let _ = fill_range src 0 (n + 1) in
let dst = ccall make_array 2 (n + 1) (ccall float_of_int 1 0) in
let _ = apply_gaussian src dst 0 (n + 1) in
sum_array dst 0 (n + 1) 0zam/
├── cabal.project # Haskell multi-package build file
├── Cargo.toml # Rust workspace root
│
├── zam-runtime-model/ # Haskell ZAM interpreter library (Hackage)
│ ├── src/
│ │ ├── RuntimeModel.hs # Main module
│ │ └── RuntimeModel/
│ │ ├── Interpreter.hs # Execution engine
│ │ ├── Values.hs # Tagged pointer representation
│ │ ├── Stacks.hs # Stack implementations
│ │ └── Heap.hs # Memory management
│ └── test/
│ └── SimpleTests.hs # Unit tests
│
├── zam-compiler/ # ML-to-bytecode compiler (Hackage)
│ └── src/
│ ├── Main.hs # CLI entry point
│ ├── ZAM.hs # Lambda IR and compilation
│ ├── Emit.hs # Bytecode emission
│ ├── Lexer.hs # Lexical analyzer
│ └── Parser.hs # Recursive descent parser
│
├── zam-runtime-rust/ # Rust ZAM interpreter (crates.io)
│ ├── src/
│ │ ├── main.rs # CLI entry point
│ │ ├── lib.rs # Library API
│ │ ├── interp.rs # Interpreter core
│ │ ├── value.rs # Value representation
│ │ ├── heap.rs # Heap management
│ │ ├── stack.rs # Stack operations
│ │ └── opcode.rs # Opcode definitions
│ └── Cargo.toml
│
├── zam-runtime-c/ # Original Caml Light 0.75 runtime (C)
│ ├── interp.c # Main interpreter
│ ├── trace.c/h # Tracing infrastructure
│ └── ...
│
├── test-suite/ # Cross-language testing
│ ├── fixtures/ # Bytecode test fixtures
│ │ ├── example1.byt
│ │ ├── example2.byt
│ │ └── factorial.byt
│ ├── SimpleTests.hs # Haskell unit tests
│ ├── DifferentialTest.hs # Trace comparison
│ └── run-diff-test.sh # Differential test script
│
├── tools/ # Developer utilities
│ └── trace_bytecode.hs # Haskell trace tool
│
├── data/ # Example ML programs
│ ├── *.ml # Source files
│ └── *.ml.byt # Compiled bytecode
│
├── docs/ # Public documentation
│ ├── zinc.md # ZAM architecture overview
│ ├── differential-test.md # Differential testing guide
│ └── impl.md # Implementation notes
Ready-to-run example programs are in the data/ directory:
| File | Description |
|---|---|
factorial.ml |
Recursive factorial function |
gaussian.ml |
Discrete Gaussian with recursion |
gaussian_kernel.ml |
Gaussian kernel with arrays and floats |
gaussian_kernel_full.ml |
Full Gaussian kernel example |
arith.ml |
Basic arithmetic with operator precedence |
ifelse.ml |
Conditional expression |
add.ml |
Two-argument function |
fun.ml |
Simple function definition |
power.ml |
Recursive power function (2^n) |
compare.ml |
Comparison operators |
time.ml |
C primitive call (sys_time) |
test_output_int.ml |
C primitive I/O (output_int) |
Run any example with:
# With Rust runtime (recommended — fast and reliable)
cabal run zam -- --runtime rust data/factorial.ml
# With Haskell runtime (handwritten, full execution trace)
cabal run zam -- --runtime haskell data/factorial.ml
# With Agda-extracted verified runtime (subset of instructions)
cabal run zam -- --runtime haskell --agda data/factorial.ml
# With C runtime (default — vintage, may crash on recursion)
cabal run zam data/factorial.mlThe project includes comprehensive testing infrastructure at multiple levels:
Tests the Haskell ZAM interpreter with hand-crafted bytecode:
cabal test zam-runtime-model:runtime-testsTests included:
| Test | Description |
|---|---|
testConstByte |
Basic constant loading |
testConstShort |
16-bit constant loading |
testLetAccess |
Environment binding and access |
testAdd |
Integer addition (3 + 5 = 8) |
testNestedLet |
Multiple nested bindings |
testComparison |
Comparison operators (<) |
testEndLet |
Environment cleanup (ENDLET) |
testBranchIf |
Conditional branching (BRANCHIF) |
testBranchIfNot |
Conditional branching (BRANCHIFNOT) |
testLoop |
Simple counter loop (factorial-like) |
testPushMark |
Stack marker operations |
testGlobals |
Global variable get/set |
Compares execution traces between all three runtimes (Haskell, C, Rust) to verify they produce identical results:
# Run differential test on a bytecode file
test-suite/run-diff-test.sh data/factorial.ml.byt
# Or test with simple examples
test-suite/run-diff-test.sh test-suite/fixtures/example1.byt
test-suite/run-diff-test.sh test-suite/fixtures/factorial.bytHow it works:
- Runs the bytecode through the Haskell interpreter, captures trace
- Runs the bytecode through the Rust runtime, captures trace
- Compares traces - they should match exactly (except for pointer values)
Test files:
| File | Description |
|---|---|
test-suite/fixtures/example1.byt |
Simple: let x = 3 in x + 5 |
test-suite/fixtures/example2.byt |
Arithmetic test |
test-suite/fixtures/factorial.byt |
Recursive function test |
data/factorial.ml.byt |
Full factorial compilation |
Run individual bytecode files through just the Haskell interpreter:
cabal exec -- runhaskell -izam-runtime-model/src tools/trace_bytecode.hs data/factorial.ml.bytRun bytecode directly with the Rust runtime:
# Basic run
./target/release/zamrun data/factorial.ml.byt
# With tracing
./target/release/zamrun -t data/factorial.ml.bytRun bytecode directly with the original Caml Light runtime:
# Basic run
./zam-runtime-c/camlrun data/factorial.ml.byt
# With tracing
ZAM_TRACE=/tmp/trace.txt ./zam-runtime-c/camlrun data/factorial.ml.byt
cat /tmp/trace.txtCompile and run ML source files:
# Compile and run with Rust runtime (recommended)
cabal run zam -- --runtime rust data/factorial.ml
# Compile and run with Haskell handwritten runtime
cabal run zam -- --runtime haskell data/factorial.ml
# Compile and run with Agda-extracted verified runtime
# Note: Agda runtime currently supports a subset of instructions
cabal run zam -- --runtime haskell --agda data/factorial.ml
# Compile and run with C runtime (default)
cabal run zam data/factorial.ml
# The compiler outputs factorial.ml.byt and runs it with the chosen runtimeThese are pre-compiled bytecode files for testing the runtime:
| File | Source | Description |
|---|---|---|
test-suite/fixtures/example1.byt |
Hand-written | Simple arithmetic test |
test-suite/fixtures/example2.byt |
Hand-written | Branching test |
test-suite/fixtures/factorial.byt |
Hand-written | Recursive function |
data/*.byt |
Compiled from .ml | Compiler output tests |
These are ML source files for testing the compiler:
| File | Description | Test Focus |
|---|---|---|
data/add.ml |
Two-argument function | Function application |
data/arith.ml |
Complex arithmetic | Operator precedence |
data/compare.ml |
Comparison operators | <, <=, >, >= |
data/factorial.ml |
Recursive factorial | Recursion, conditionals |
data/gaussian.ml |
Discrete Gaussian function | Recursion, conditionals |
data/gaussian_kernel.ml |
Gaussian kernel (floats + arrays) | Multi-param recursion, arrays, floats |
data/gaussian_kernel_full.ml |
Full Gaussian kernel | Arrays, floats, math primitives |
data/fun.ml |
Anonymous functions | fun x -> expr |
data/ifelse.ml |
Conditionals | if-then-else |
data/power.ml |
Recursive power (2^n) | Recursion |
data/time.ml |
C primitive (sys_time) | External calls |
data/test_output_int.ml |
C primitive I/O | output_int |
The compiler follows the ZINC (ZINC Is Not Caml) abstract machine design:
- Arguments passed in a cache (return stack) for efficient tail calls
- Proper handling of recursive bindings using DUMMY/LET/CUR/UPDATE sequence
- Compatible with the original Caml Light runtime bytecode format
| Feature | Haskell Model (Handwritten) | Haskell Model (Agda) | Rust Runtime | C Runtime |
|---|---|---|---|---|
| Language | Haskell | Haskell (from Agda) | Rust | C |
| Purpose | Verification / Reference | Formal Verification | Production / Performance | Historical |
| Memory Safety | ✅ GC | ✅ GC | ✅ Borrow checker | ❌ Manual |
| Performance | Baseline | ~90% of baseline | ~114× faster | ~105× faster |
| Reliability | 100% | 100% | 100% | ~74% |
| Trace Output | Full | Full | Medium | Minimal |
| Float Support | ✅ Boxed | ❌ Partial | ✅ Boxed | ✅ Boxed |
| Array Primitives | ❌ | ❌ | ✅ | ✅ |
| Math Primitives | ❌ | ❌ | ✅ | ✅ |
| Verification | Tested | Machine-checked | Tested | Legacy |
The project includes a unique verification approach using a Haskell runtime model to find bugs through differential testing:
- Haskell Runtime Model (
zam-runtime-model/): A clean-room implementation of the ZAM interpreter in Haskell, with both handwritten and Agda-extracted variants - Rust Runtime (
zam-runtime-rust/): A memory-safe, high-performance implementation in Rust - C Runtime (
zam-runtime-c/): The original Caml Light runtime with tracing instrumentation - Trace Comparison (
test-suite/DifferentialTest.hs): Automated comparison of execution traces across all three
This approach allowed us to identify and fix bugs in our understanding of the ZAM semantics by comparing three independent implementations.
The differential testing works by running the same bytecode through all three runtimes and comparing execution traces:
┌─────────────────┐ ┌─────────────────┐
│ Bytecode File │────▶│ Haskell Model │──▶ Trace A
│ (.byt) │ │ (Interpreter) │
└─────────────────┘ └─────────────────┘
│
│ ┌─────────────────┐
├─────────────▶│ Rust Runtime │──▶ Trace B
│ │ (zamrun) │
│ └─────────────────┘
│
│ ┌─────────────────┐
└─────────────▶│ C Runtime │──▶ Trace C
│ (camlrun) │
└─────────────────┘
│
▼
┌─────────────────┐
│ TraceCompare │──▶ Match / Mismatch
│ (Diff Analysis) │
└─────────────────┘
This approach successfully identified and helped fix several bugs:
| Bug | Description | Fix |
|---|---|---|
| CONSTBYTE Semantics | Haskell re-encoded pre-tagged Caml Light constants | Raw load: Value (fromIntegral b) |
| CUR Offset | Closure code address off-by-2 | pc + offset - 2 |
| Negative Integer Decoding | longVal didn't handle sign extension |
Convert through Int64 |
| Closure Comparison | BRANCHIFGT compared n > closure_ptr |
Treat block pointers as 0 |
| Branch Offset | Off-by-2 error in branch target | Corrected offset math |
| Float Boxing | Raw float bits collided with integer tag (LSB=1) | Box floats as TAG_DOUBLE heap blocks |
| Multi-param GRAB | Functions compiled without KGrab prefix |
Always emit KGrab in compileRest |
| Partial Application | APPLY consumed arg before GRAB |
Leave arg on stack for GRAB |
| Recursion Scope | adjustVars double-adjusted LLet RHSs |
Added maxPending parameter, skip LFunction bodies |
Current status: 10/10 test files match across Haskell and Rust runtimes ✅ (fresh run)
| Test | Description | Haskell | C | Rust | H≈R |
|---|---|---|---|---|---|
test-suite/fixtures/example1.byt |
Simple arithmetic | I#3 | (silent) | I#3 | ✅ |
test-suite/fixtures/example2.byt |
Constant loading | I#1 | (silent) | I#1 | ✅ |
test-suite/fixtures/factorial.byt |
Recursive factorial | I#120 | fail | I#120 | ✅ |
data/factorial.ml.byt |
Compiled ML factorial | I#120 | (silent) | I#120 | ✅ |
data/test_simple_calle.ml.byt |
C primitive call | I#0 | fail | I#0 | ✅ |
data/power.ml.byt |
Recursive power | I#1024 | (silent) | I#1024 | ✅ |
data/arith.ml.byt |
Mixed arithmetic | I#16 | (silent) | I#16 | ✅ |
data/ifelse.ml.byt |
Conditional | I#42 | (silent) | I#42 | ✅ |
data/compare.ml.byt |
Comparison | I#0 | (silent) | I#0 | ✅ |
data/fun.ml.byt |
Function application | I#11 | (silent) | I#11 | ✅ |
C runtime notes:
- Simple programs: exits successfully but produces no output
- Recursive programs (
factorial.byt,test_simple_calle.ml.byt): crashes (known Caml Light 0.75 instability)
This project was developed through a collaborative human-AI workflow. See AI_USAGE.md for details on the division of responsibilities between human and AI contributors.
MIT