Skip to content

overshiki/zam

Repository files navigation

ZAM - ZINC Abstract Machine Compiler and Runtimes

A compiler for a subset of ML (Caml Light) targeting the ZINC Abstract Machine, with three runtime implementations: the original Caml Light runtime (C), a verified Haskell reference model, and a high-performance Rust runtime.

Overview

This project implements:

  • Compiler (zam-compiler): Translates ML source code into Caml Light-compatible bytecode (Haskell)
  • Haskell Runtime Model (zam-runtime-model): Clean-room ZAM interpreter for verification and differential testing
  • Rust Runtime (zam-runtime-rust): High-performance, memory-safe ZAM interpreter
  • C Runtime (zam-runtime-c): Original Caml Light 0.75 bytecode interpreter (reference)

Features:

  • Code Generation: Emits Caml Light-compatible bytecode
  • Verification: Differential testing across all three runtimes

Background: The ZINC Abstract Machine

This project is built on the foundation of the ZINC Abstract Machine (ZINC Is Not Caml), described in Xavier Leroy's seminal paper. For a comprehensive introduction to the ZINC ideology, design principles, and execution model, see:

📄 docs/zinc.md - The ZINC Abstract Machine: A Portable Runtime System for ML


Three Key Achievements

Key Achievement 1: Renaissance of Caml Light 0.75 Heritage

This project connects a modern Haskell bytecode compiler to the original Caml Light 0.75 (CL75) runtime, breathing new life into this historically significant piece of functional programming heritage.

Why This Matters:

  • Educational Platform: Provides a complete, working implementation of the ZINC abstract machine for students and researchers
  • Research Vehicle: Creates a new platform for experiments on the classic ZAM architecture
  • Historical Preservation: Keeps the CL75 runtime accessible and usable with modern tooling

What We Built:

  • A complete ML-to-bytecode compiler in Haskell targeting the original CL75 runtime
  • Full compatibility with the vintage Caml Light 0.75 bytecode format
  • Support for arithmetic, conditionals, let-bindings, recursive functions, and C primitive calls

Key Achievement 2: Verified Compiler via Differential Testing

To ensure our compiler generates correct bytecode, we built a complete Haskell implementation of the ZAM bytecode interpreter and verified it through differential testing against the original C runtime.

The Challenge: The vintage Caml Light 0.75 runtime occasionally crashes with recursive functions (segfaults), making it hard to determine if crashes are due to our compiler generating wrong bytecode or the runtime's memory issues.

The Solution: We implemented a clean-room Haskell interpreter that follows the same ZAM semantics but without the C runtime's memory management issues. By running identical bytecode through both interpreters and comparing execution traces, we can:

  • Verify our compiler generates correct bytecode
  • Find bugs in our understanding of ZAM semantics
  • Distinguish compiler bugs from runtime stability issues

Result: Successfully identified and fixed 5+ bugs, achieving 100% trace match on all test cases including recursive factorial. The Haskell runtime model is now a reliable reference implementation for ZAM semantics.

Key Achievement 3: High-Performance, Memory-Safe Rust Runtime

We implemented a ground-up Rust runtime (zamrun) that combines C-level performance with Haskell-level safety guarantees.

Performance: The Rust runtime is ~114× faster than the Haskell model and ~1.09× faster than the C runtime on factorial benchmarks, while maintaining 100% reliability (the C runtime crashes ~26% of the time on recursive programs).

Safety: Rust's ownership system and borrow checker eliminate the memory safety bugs that plague the vintage C runtime — no segfaults, no use-after-free, no buffer overflows.

Verification: The Rust runtime passes the same differential tests as the Haskell model, producing identical results on all test files.


Quick Start

Build Everything

# Build all Haskell packages (compiler + runtime model)
cabal build all

# Build Rust runtime
cargo build --workspace --release

# Build C runtime (optional)
make -C zam-runtime-c

Run Examples

# Run built-in test examples (default: C runtime)
cabal run zam

# Compile and run an ML source file (default: C runtime)
cabal run zam data/factorial.ml

# Run with the Rust runtime (recommended — fast and reliable)
cabal run zam -- --runtime rust data/factorial.ml

# Run with the Haskell model (handwritten, full execution trace)
cabal run zam -- --runtime haskell data/factorial.ml

# Run with the Agda-extracted verified runtime
# Note: The Agda runtime currently supports a subset of instructions
cabal run zam -- --runtime haskell --agda data/factorial.ml

# Run built-in examples with Rust runtime
cabal run zam -- --runtime rust

Note: The C runtime (--runtime c, the default) has known stability issues with recursive programs. Use --runtime rust for reliable execution.

Run Differential Tests

# Verify compiler correctness by comparing all three runtimes
test-suite/run-diff-test.sh data/factorial.ml.byt

Run Unit Tests

# Haskell runtime model unit tests
cabal test zam-runtime-model:runtime-tests

# Rust runtime tests
cargo test --workspace

Supported Language Features

  • Values: Integer constants, variables, float values (via float_of_int)
  • Arithmetic: +, -, *, / with proper precedence
  • Float arithmetic: +., -., *., /. for floating-point operations
  • Comparisons: <, <=, >, >=, =
  • Conditionals: if ... then ... else
  • Let bindings: let x = expr in expr
  • Recursive functions: let rec f x = expr in expr
  • Multi-parameter functions: let add a b = fun a b -> a + b and recursive multi-parameter functions
  • Anonymous functions: fun x -> expr
  • Function application: f arg
  • C primitive calls: ccall name arity args... (e.g., ccall sys_time 0, ccall output_int 2 stdout 42)
  • Math primitives: ccall exp 1, ccall log 1, ccall sqrt 1, ccall pow 2, ccall sin 1, ccall cos 1, ccall tan 1, ccall log10 1, ccall abs_float 1
  • Float conversions: ccall float_of_int 1, ccall int_of_float 1
  • Array primitives: ccall make_array 2 size init, ccall array_get 2 arr idx, ccall array_set 3 arr idx val, ccall array_length 1 arr

Example Programs

Factorial

let rec fact n =
  if n <= 0 then 1
  else n * fact (n - 1)
in
fact 5

Simple Arithmetic

let x = 1 in x + 2

Functions

let add x y = x + y in add 3 4

Gaussian Kernel (Arrays + Floats + Recursion)

let rec gaussian x =
  let x_float = ccall float_of_int 1 x in
  let neg_one = ccall float_of_int 1 (0 - 1) in
  let two = ccall float_of_int 1 2 in
  let x2 = x_float *. x_float in
  let neg_x2 = neg_one *. x2 in
  let neg_x2_div_2 = neg_x2 /. two in
  ccall exp 1 neg_x2_div_2
in
let rec fill_range arr idx size =
  if idx >= size then arr
  else
    let _ = ccall array_set 3 arr idx idx in
    fill_range arr (idx + 1) size
in
let rec apply_gaussian src dst idx size =
  if idx >= size then dst
  else
    let x = ccall array_get 2 src idx in
    let y = gaussian x in
    let _ = ccall array_set 3 dst idx y in
    apply_gaussian src dst (idx + 1) size
in
let rec sum_array arr idx size acc =
  if idx >= size then acc
  else
    let x = ccall array_get 2 arr idx in
    let x_int = ccall int_of_float 1 x in
    sum_array arr (idx + 1) size (acc + x_int)
in
let n = 5 in
let src = ccall make_array 2 (n + 1) (ccall float_of_int 1 0) in
let _ = fill_range src 0 (n + 1) in
let dst = ccall make_array 2 (n + 1) (ccall float_of_int 1 0) in
let _ = apply_gaussian src dst 0 (n + 1) in
sum_array dst 0 (n + 1) 0

Project Structure

zam/
├── cabal.project              # Haskell multi-package build file
├── Cargo.toml                 # Rust workspace root
│
├── zam-runtime-model/         # Haskell ZAM interpreter library (Hackage)
│   ├── src/
│   │   ├── RuntimeModel.hs    # Main module
│   │   └── RuntimeModel/
│   │       ├── Interpreter.hs # Execution engine
│   │       ├── Values.hs      # Tagged pointer representation
│   │       ├── Stacks.hs      # Stack implementations
│   │       └── Heap.hs        # Memory management
│   └── test/
│       └── SimpleTests.hs     # Unit tests
│
├── zam-compiler/              # ML-to-bytecode compiler (Hackage)
│   └── src/
│       ├── Main.hs            # CLI entry point
│       ├── ZAM.hs             # Lambda IR and compilation
│       ├── Emit.hs            # Bytecode emission
│       ├── Lexer.hs           # Lexical analyzer
│       └── Parser.hs          # Recursive descent parser
│
├── zam-runtime-rust/          # Rust ZAM interpreter (crates.io)
│   ├── src/
│   │   ├── main.rs            # CLI entry point
│   │   ├── lib.rs             # Library API
│   │   ├── interp.rs          # Interpreter core
│   │   ├── value.rs           # Value representation
│   │   ├── heap.rs            # Heap management
│   │   ├── stack.rs           # Stack operations
│   │   └── opcode.rs          # Opcode definitions
│   └── Cargo.toml
│
├── zam-runtime-c/             # Original Caml Light 0.75 runtime (C)
│   ├── interp.c               # Main interpreter
│   ├── trace.c/h              # Tracing infrastructure
│   └── ...
│
├── test-suite/                # Cross-language testing
│   ├── fixtures/              # Bytecode test fixtures
│   │   ├── example1.byt
│   │   ├── example2.byt
│   │   └── factorial.byt
│   ├── SimpleTests.hs         # Haskell unit tests
│   ├── DifferentialTest.hs    # Trace comparison
│   └── run-diff-test.sh       # Differential test script
│
├── tools/                     # Developer utilities
│   └── trace_bytecode.hs      # Haskell trace tool
│
├── data/                      # Example ML programs
│   ├── *.ml                   # Source files
│   └── *.ml.byt               # Compiled bytecode
│
├── docs/                      # Public documentation
│   ├── zinc.md                # ZAM architecture overview
│   ├── differential-test.md   # Differential testing guide
│   └── impl.md                # Implementation notes

Example Programs

Ready-to-run example programs are in the data/ directory:

File Description
factorial.ml Recursive factorial function
gaussian.ml Discrete Gaussian with recursion
gaussian_kernel.ml Gaussian kernel with arrays and floats
gaussian_kernel_full.ml Full Gaussian kernel example
arith.ml Basic arithmetic with operator precedence
ifelse.ml Conditional expression
add.ml Two-argument function
fun.ml Simple function definition
power.ml Recursive power function (2^n)
compare.ml Comparison operators
time.ml C primitive call (sys_time)
test_output_int.ml C primitive I/O (output_int)

Run any example with:

# With Rust runtime (recommended — fast and reliable)
cabal run zam -- --runtime rust data/factorial.ml

# With Haskell runtime (handwritten, full execution trace)
cabal run zam -- --runtime haskell data/factorial.ml

# With Agda-extracted verified runtime (subset of instructions)
cabal run zam -- --runtime haskell --agda data/factorial.ml

# With C runtime (default — vintage, may crash on recursion)
cabal run zam data/factorial.ml

Testing

The project includes comprehensive testing infrastructure at multiple levels:

1. Unit Tests (Haskell Runtime Model)

Tests the Haskell ZAM interpreter with hand-crafted bytecode:

cabal test zam-runtime-model:runtime-tests

Tests included:

Test Description
testConstByte Basic constant loading
testConstShort 16-bit constant loading
testLetAccess Environment binding and access
testAdd Integer addition (3 + 5 = 8)
testNestedLet Multiple nested bindings
testComparison Comparison operators (<)
testEndLet Environment cleanup (ENDLET)
testBranchIf Conditional branching (BRANCHIF)
testBranchIfNot Conditional branching (BRANCHIFNOT)
testLoop Simple counter loop (factorial-like)
testPushMark Stack marker operations
testGlobals Global variable get/set

2. Differential Testing

Compares execution traces between all three runtimes (Haskell, C, Rust) to verify they produce identical results:

# Run differential test on a bytecode file
test-suite/run-diff-test.sh data/factorial.ml.byt

# Or test with simple examples
test-suite/run-diff-test.sh test-suite/fixtures/example1.byt
test-suite/run-diff-test.sh test-suite/fixtures/factorial.byt

How it works:

  1. Runs the bytecode through the Haskell interpreter, captures trace
  2. Runs the bytecode through the Rust runtime, captures trace
  3. Compares traces - they should match exactly (except for pointer values)

Test files:

File Description
test-suite/fixtures/example1.byt Simple: let x = 3 in x + 5
test-suite/fixtures/example2.byt Arithmetic test
test-suite/fixtures/factorial.byt Recursive function test
data/factorial.ml.byt Full factorial compilation

3. Manual Testing with Haskell Model

Run individual bytecode files through just the Haskell interpreter:

cabal exec -- runhaskell -izam-runtime-model/src tools/trace_bytecode.hs data/factorial.ml.byt

4. Manual Testing with Rust Runtime

Run bytecode directly with the Rust runtime:

# Basic run
./target/release/zamrun data/factorial.ml.byt

# With tracing
./target/release/zamrun -t data/factorial.ml.byt

5. Manual Testing with C Runtime

Run bytecode directly with the original Caml Light runtime:

# Basic run
./zam-runtime-c/camlrun data/factorial.ml.byt

# With tracing
ZAM_TRACE=/tmp/trace.txt ./zam-runtime-c/camlrun data/factorial.ml.byt
cat /tmp/trace.txt

6. Compiler Tests (ML Source Files)

Compile and run ML source files:

# Compile and run with Rust runtime (recommended)
cabal run zam -- --runtime rust data/factorial.ml

# Compile and run with Haskell handwritten runtime
cabal run zam -- --runtime haskell data/factorial.ml

# Compile and run with Agda-extracted verified runtime
# Note: Agda runtime currently supports a subset of instructions
cabal run zam -- --runtime haskell --agda data/factorial.ml

# Compile and run with C runtime (default)
cabal run zam data/factorial.ml

# The compiler outputs factorial.ml.byt and runs it with the chosen runtime

Test Data Files

Bytecode Files (.byt)

These are pre-compiled bytecode files for testing the runtime:

File Source Description
test-suite/fixtures/example1.byt Hand-written Simple arithmetic test
test-suite/fixtures/example2.byt Hand-written Branching test
test-suite/fixtures/factorial.byt Hand-written Recursive function
data/*.byt Compiled from .ml Compiler output tests

ML Source Files (.ml)

These are ML source files for testing the compiler:

File Description Test Focus
data/add.ml Two-argument function Function application
data/arith.ml Complex arithmetic Operator precedence
data/compare.ml Comparison operators <, <=, >, >=
data/factorial.ml Recursive factorial Recursion, conditionals
data/gaussian.ml Discrete Gaussian function Recursion, conditionals
data/gaussian_kernel.ml Gaussian kernel (floats + arrays) Multi-param recursion, arrays, floats
data/gaussian_kernel_full.ml Full Gaussian kernel Arrays, floats, math primitives
data/fun.ml Anonymous functions fun x -> expr
data/ifelse.ml Conditionals if-then-else
data/power.ml Recursive power (2^n) Recursion
data/time.ml C primitive (sys_time) External calls
data/test_output_int.ml C primitive I/O output_int

Technical Details

The compiler follows the ZINC (ZINC Is Not Caml) abstract machine design:

  • Arguments passed in a cache (return stack) for efficient tail calls
  • Proper handling of recursive bindings using DUMMY/LET/CUR/UPDATE sequence
  • Compatible with the original Caml Light runtime bytecode format

Three-Runtime Comparison

Feature Haskell Model (Handwritten) Haskell Model (Agda) Rust Runtime C Runtime
Language Haskell Haskell (from Agda) Rust C
Purpose Verification / Reference Formal Verification Production / Performance Historical
Memory Safety ✅ GC ✅ GC ✅ Borrow checker ❌ Manual
Performance Baseline ~90% of baseline ~114× faster ~105× faster
Reliability 100% 100% 100% ~74%
Trace Output Full Full Medium Minimal
Float Support ✅ Boxed ❌ Partial ✅ Boxed ✅ Boxed
Array Primitives
Math Primitives
Verification Tested Machine-checked Tested Legacy

Differential Testing Architecture

The project includes a unique verification approach using a Haskell runtime model to find bugs through differential testing:

  1. Haskell Runtime Model (zam-runtime-model/): A clean-room implementation of the ZAM interpreter in Haskell, with both handwritten and Agda-extracted variants
  2. Rust Runtime (zam-runtime-rust/): A memory-safe, high-performance implementation in Rust
  3. C Runtime (zam-runtime-c/): The original Caml Light runtime with tracing instrumentation
  4. Trace Comparison (test-suite/DifferentialTest.hs): Automated comparison of execution traces across all three

This approach allowed us to identify and fix bugs in our understanding of the ZAM semantics by comparing three independent implementations.

Differential Testing Procedure

The differential testing works by running the same bytecode through all three runtimes and comparing execution traces:

┌─────────────────┐     ┌─────────────────┐
│  Bytecode File  │────▶│  Haskell Model  │──▶ Trace A
│   (.byt)        │     │  (Interpreter)  │
└─────────────────┘     └─────────────────┘
         │
         │              ┌─────────────────┐
         ├─────────────▶│  Rust Runtime   │──▶ Trace B
         │              │  (zamrun)       │
         │              └─────────────────┘
         │
         │              ┌─────────────────┐
         └─────────────▶│  C Runtime      │──▶ Trace C
                        │  (camlrun)      │
                        └─────────────────┘
                                 │
                                 ▼
                        ┌─────────────────┐
                        │ TraceCompare    │──▶ Match / Mismatch
                        │ (Diff Analysis) │
                        └─────────────────┘

Bugs Found via Differential Testing

This approach successfully identified and helped fix several bugs:

Bug Description Fix
CONSTBYTE Semantics Haskell re-encoded pre-tagged Caml Light constants Raw load: Value (fromIntegral b)
CUR Offset Closure code address off-by-2 pc + offset - 2
Negative Integer Decoding longVal didn't handle sign extension Convert through Int64
Closure Comparison BRANCHIFGT compared n > closure_ptr Treat block pointers as 0
Branch Offset Off-by-2 error in branch target Corrected offset math
Float Boxing Raw float bits collided with integer tag (LSB=1) Box floats as TAG_DOUBLE heap blocks
Multi-param GRAB Functions compiled without KGrab prefix Always emit KGrab in compileRest
Partial Application APPLY consumed arg before GRAB Leave arg on stack for GRAB
Recursion Scope adjustVars double-adjusted LLet RHSs Added maxPending parameter, skip LFunction bodies

Test Results

Current status: 10/10 test files match across Haskell and Rust runtimes ✅ (fresh run)

Test Description Haskell C Rust H≈R
test-suite/fixtures/example1.byt Simple arithmetic I#3 (silent) I#3
test-suite/fixtures/example2.byt Constant loading I#1 (silent) I#1
test-suite/fixtures/factorial.byt Recursive factorial I#120 fail I#120
data/factorial.ml.byt Compiled ML factorial I#120 (silent) I#120
data/test_simple_calle.ml.byt C primitive call I#0 fail I#0
data/power.ml.byt Recursive power I#1024 (silent) I#1024
data/arith.ml.byt Mixed arithmetic I#16 (silent) I#16
data/ifelse.ml.byt Conditional I#42 (silent) I#42
data/compare.ml.byt Comparison I#0 (silent) I#0
data/fun.ml.byt Function application I#11 (silent) I#11

C runtime notes:

  • Simple programs: exits successfully but produces no output
  • Recursive programs (factorial.byt, test_simple_calle.ml.byt): crashes (known Caml Light 0.75 instability)

AI Usage

This project was developed through a collaborative human-AI workflow. See AI_USAGE.md for details on the division of responsibilities between human and AI contributors.


License

MIT

About

connects a modern Haskell bytecode compiler to the original Caml Light 0.75 (CL75) runtime

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages