Skip to content

Hridye5h/Match_Engine

Repository files navigation

Match Engine

CI License: MIT

A price-time-priority limit-order-book matching engine in modern C++ (C++20).

Status: Phases 0–4 complete — a fully tested engine (71 tests) at ~13 M orders/s, p99 ≈ 400 ns, plus a lock-free feed pipeline, an order-protocol parser, and a tiny backtester. See the roadmap below.

What it does (target)

Ingests a stream of orders (LIMIT / MARKET / CANCEL / MODIFY), maintains a two-sided order book with price-time priority — best price first, and FIFO (earliest order first) within a price level — matches crossing orders into trades, and supports partial fills. The matching core is single-threaded and deterministic.

Architecture

flowchart LR
    subgraph feed["Feed thread (producer)"]
        SRC["order file / generator"] --> PARSE["protocol parser"]
    end

    PARSE -->|orders| RING["lock-free SPSC ring"]

    subgraph match["Matching thread (single-threaded, deterministic)"]
        ENGINE["OrderBook"]
        IDX["order_id index<br/>O(1) cancel"]
        BIDS["bids: ticked array<br/>+ best-bid index"]
        ASKS["asks: ticked array<br/>+ best-ask index"]
        LVL["PriceLevel<br/>intrusive FIFO list"]
        POOL["OrderPool<br/>object pool"]

        ENGINE --- IDX
        ENGINE --> BIDS
        ENGINE --> ASKS
        BIDS --> LVL
        ASKS --> LVL
        LVL --> POOL
    end

    RING -->|orders| ENGINE
    ENGINE -->|trades| TRADES["trades"]
    ENGINE -->|BBO / depth| QUERIES["queries"]
    STRAT["backtester + strategy"] -. observes .-> ENGINE
Loading

The matching core is single-threaded and deterministic — the feed thread only produces; the SPSC ring hands orders to the one matching thread. See docs/architecture.md for a component legend and docs/design-decisions.md for the reasoning.

Design highlights

  • Price-time priority — best price first, FIFO (time order) within a level.
  • O(1) cancel via an order_id → node index into an intrusive FIFO list at each price level (nodes drawn from an object pool).
  • LIMIT / MARKET / IOC / FOK / MODIFY order operations; BBO, depth, snapshot.
  • Integer-tick prices (never floating point); single-threaded & deterministic.
  • Benchmark harness — throughput + latency percentiles (p50 / p99 / p999).
  • Arena / object-pool allocator — zero per-order heap allocation on the hot path (an intrusive list of pooled nodes).
  • Lock-free SPSC ring decoupling a feed thread from the single-threaded matching thread.
  • Order-protocol parser + replay feed handler, plus a tiny strategy backtester (P&L over a replayed book).

Build

Toolchain: GCC (MinGW-w64, UCRT) · CMake · Ninja. GoogleTest is fetched automatically at configure time (no manual install).

# Windows / PowerShell. The first line puts the toolchain on PATH for an
# already-open shell (new shells get it automatically after a restart):
. .\scripts\dev-env.ps1

cmake -S . -B build -G Ninja
cmake --build build
ctest --test-dir build --output-on-failure

# Run the benchmark (optimized Release build) for throughput + latency:
.\build\bench\bench_engine.exe

Try it (working prototype)

Download prebuilt Linux + Windows binaries from Releases — no toolchain needed — or build as above. Then:

# Replay the sample order script: prints every trade and the final book
.\build\tools\match_cli.exe examples\sample_orders.txt

# Or drive the book interactively (HELP for syntax, BOOK to print depth)
.\build\tools\match_cli.exe

Replaying examples/sample_orders.txt:

  TRADE  5 @ 101   (buy #5 / sell #2)     <- sweeps best ask first
  TRADE  5 @ 102   (buy #5 / sell #1)     <- then the next level
  TRADE  3 @ 99    (buy #3 / sell #6)     <- market sell hits best bid
  cancelled #4
  rests  5 @ 100  (order #3, back of the queue)   <- modify forfeits priority

applied 8 commands, 3 trades

final   ------ book (top 5) ------
             BID  |  ASK
         5 @ 100  |  102 @ 3
  BBO: 100 / 102   (spread 2)

Benchmarks

Single thread, 1,000,000 limit orders within a 201-tick band (GCC 16.1 -O3, MinGW-w64 UCRT, Windows; reproducible via a fixed PRNG seed).

Build Throughput p50 p99 p999
Baseline (std::map + std::list) 4.53 M orders/s 200 ns 800 ns 2.0 µs
+ arena pool & intrusive list 5.57 M orders/s 100 ns 700 ns 1.9 µs
+ trade-buffer reuse (out-param) 9.53 M orders/s 100 ns 400 ns 0.9 µs
+ ticked-array book (O(1) levels) 12.99 M orders/s 100 ns 400 ns 1.0 µs

Methodology, kept honest: the order stream is generated up front (excluded from timing); throughput is the whole batch under one clock pair; latency is per-submit via steady_clock. Sub-µs values sit at steady_clock's ~100 ns resolution floor, and the worst-case max (~5–16 ms) is a Windows scheduler quantum (OS jitter) — which is why we report percentiles, not max. Each row above is an independent, reproducible before/after measured the same way.

Feed → match pipeline: a lock-free SPSC ring (include/matchengine/SpscRing.hpp) decouples a feed (producer) thread from the matching (consumer) thread. With orders pre-generated the ring is pure overhead, so the pipeline (~6.9 M/s) runs slower than the single-thread path — by design. The payoff is realism and decoupling: a real feed thread parses network/file data and overlaps that with matching, while the matching core stays single-threaded and deterministic.

Layout

Path Contents
include/matchengine/ Public API headers
src/ Engine implementation
tests/ GoogleTest unit tests
bench/ Benchmark, feed-pipeline, and backtester demos
tools/ match_cli — runnable replay/interactive demo
examples/ Sample order script for replay
docs/ Architecture diagram, design-decisions notes

Roadmap

  • Phase 0 — toolchain, CMake + GoogleTest, CI skeleton.
  • Phase 1 — core data model + single-level matching, partial fills.
  • Phase 2 — two-sided book, O(1) cancel, market/modify/IOC/FOK, BBO/depth/snapshot.
  • Phase 3 — benchmark harness; arena + intrusive list; trade-buffer reuse; ticked-array book12.99 M orders/s (2.9× baseline), p99 ≈ 400 ns.
  • Phase 4 — lock-free SPSC feed thread; order-protocol parser + file-replay feed handler; tiny strategy backtester. All stretch goals done.

About

Price-time-priority limit-order-book matching engine in C++ (~13M orders/s, p99 ~400ns): O(1) cancel, arena allocator, ticked-array book, lock-free SPSC feed pipeline, order-protocol parser, and a latency-benchmark harness.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors