A price-time-priority limit-order-book matching engine in modern C++ (C++20).
✅ Status: Phases 0–4 complete — a fully tested engine (71 tests) at ~13 M orders/s, p99 ≈ 400 ns, plus a lock-free feed pipeline, an order-protocol parser, and a tiny backtester. See the roadmap below.
Ingests a stream of orders (LIMIT / MARKET / CANCEL / MODIFY), maintains a two-sided order book with price-time priority — best price first, and FIFO (earliest order first) within a price level — matches crossing orders into trades, and supports partial fills. The matching core is single-threaded and deterministic.
flowchart LR
subgraph feed["Feed thread (producer)"]
SRC["order file / generator"] --> PARSE["protocol parser"]
end
PARSE -->|orders| RING["lock-free SPSC ring"]
subgraph match["Matching thread (single-threaded, deterministic)"]
ENGINE["OrderBook"]
IDX["order_id index<br/>O(1) cancel"]
BIDS["bids: ticked array<br/>+ best-bid index"]
ASKS["asks: ticked array<br/>+ best-ask index"]
LVL["PriceLevel<br/>intrusive FIFO list"]
POOL["OrderPool<br/>object pool"]
ENGINE --- IDX
ENGINE --> BIDS
ENGINE --> ASKS
BIDS --> LVL
ASKS --> LVL
LVL --> POOL
end
RING -->|orders| ENGINE
ENGINE -->|trades| TRADES["trades"]
ENGINE -->|BBO / depth| QUERIES["queries"]
STRAT["backtester + strategy"] -. observes .-> ENGINE
The matching core is single-threaded and deterministic — the feed thread only
produces; the SPSC ring hands orders to the one matching thread. See
docs/architecture.md for a component legend and
docs/design-decisions.md for the reasoning.
- ✅ Price-time priority — best price first, FIFO (time order) within a level.
- ✅ O(1) cancel via an
order_id → nodeindex into an intrusive FIFO list at each price level (nodes drawn from an object pool). - ✅ LIMIT / MARKET / IOC / FOK / MODIFY order operations; BBO, depth, snapshot.
- ✅ Integer-tick prices (never floating point); single-threaded & deterministic.
- ✅ Benchmark harness — throughput + latency percentiles (p50 / p99 / p999).
- ✅ Arena / object-pool allocator — zero per-order heap allocation on the hot path (an intrusive list of pooled nodes).
- ✅ Lock-free SPSC ring decoupling a feed thread from the single-threaded matching thread.
- ✅ Order-protocol parser + replay feed handler, plus a tiny strategy backtester (P&L over a replayed book).
Toolchain: GCC (MinGW-w64, UCRT) · CMake · Ninja. GoogleTest is fetched automatically at configure time (no manual install).
# Windows / PowerShell. The first line puts the toolchain on PATH for an
# already-open shell (new shells get it automatically after a restart):
. .\scripts\dev-env.ps1
cmake -S . -B build -G Ninja
cmake --build build
ctest --test-dir build --output-on-failure
# Run the benchmark (optimized Release build) for throughput + latency:
.\build\bench\bench_engine.exeDownload prebuilt Linux + Windows binaries from Releases — no toolchain needed — or build as above. Then:
# Replay the sample order script: prints every trade and the final book
.\build\tools\match_cli.exe examples\sample_orders.txt
# Or drive the book interactively (HELP for syntax, BOOK to print depth)
.\build\tools\match_cli.exeReplaying examples/sample_orders.txt:
TRADE 5 @ 101 (buy #5 / sell #2) <- sweeps best ask first
TRADE 5 @ 102 (buy #5 / sell #1) <- then the next level
TRADE 3 @ 99 (buy #3 / sell #6) <- market sell hits best bid
cancelled #4
rests 5 @ 100 (order #3, back of the queue) <- modify forfeits priority
applied 8 commands, 3 trades
final ------ book (top 5) ------
BID | ASK
5 @ 100 | 102 @ 3
BBO: 100 / 102 (spread 2)
Single thread, 1,000,000 limit orders within a 201-tick band (GCC 16.1 -O3,
MinGW-w64 UCRT, Windows; reproducible via a fixed PRNG seed).
| Build | Throughput | p50 | p99 | p999 |
|---|---|---|---|---|
Baseline (std::map + std::list) |
4.53 M orders/s | 200 ns | 800 ns | 2.0 µs |
| + arena pool & intrusive list | 5.57 M orders/s | 100 ns | 700 ns | 1.9 µs |
| + trade-buffer reuse (out-param) | 9.53 M orders/s | 100 ns | 400 ns | 0.9 µs |
| + ticked-array book (O(1) levels) | 12.99 M orders/s | 100 ns | 400 ns | 1.0 µs |
Methodology, kept honest: the order stream is generated up front (excluded from
timing); throughput is the whole batch under one clock pair; latency is
per-submit via steady_clock. Sub-µs values sit at steady_clock's ~100 ns
resolution floor, and the worst-case max (~5–16 ms) is a Windows scheduler
quantum (OS jitter) — which is why we report percentiles, not max. Each row
above is an independent, reproducible before/after measured the same way.
Feed → match pipeline: a lock-free SPSC ring (include/matchengine/SpscRing.hpp)
decouples a feed (producer) thread from the matching (consumer) thread. With orders
pre-generated the ring is pure overhead, so the pipeline (~6.9 M/s) runs slower
than the single-thread path — by design. The payoff is realism and decoupling: a
real feed thread parses network/file data and overlaps that with matching, while
the matching core stays single-threaded and deterministic.
| Path | Contents |
|---|---|
include/matchengine/ |
Public API headers |
src/ |
Engine implementation |
tests/ |
GoogleTest unit tests |
bench/ |
Benchmark, feed-pipeline, and backtester demos |
tools/ |
match_cli — runnable replay/interactive demo |
examples/ |
Sample order script for replay |
docs/ |
Architecture diagram, design-decisions notes |
- Phase 0 — toolchain, CMake + GoogleTest, CI skeleton.
- Phase 1 — core data model + single-level matching, partial fills.
- Phase 2 — two-sided book, O(1) cancel, market/modify/IOC/FOK, BBO/depth/snapshot.
- Phase 3 — benchmark harness; arena + intrusive list; trade-buffer reuse; ticked-array book → 12.99 M orders/s (2.9× baseline), p99 ≈ 400 ns.
- Phase 4 — lock-free SPSC feed thread; order-protocol parser + file-replay feed handler; tiny strategy backtester. All stretch goals done.