Skip to content

armanas/BCR-memory-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BCR-memory-2

CI License: MIT OR Apache-2.0

A correctness-first Rust reimplementation of the BCR-memory v1 prefix cache, with regression tests for every bug we found in the original.

One-sentence summary. The flat-array DFA idea is sound; the v1 implementation had four concrete bugs that made it unsafe under sustained churn. This repo fixes them, proves each fix with a dedicated test, and documents what we learned comparing the result to production prefix caches. See TECHNICAL_NOTE.md for the long version.


What's fixed vs. v1

Bug in v1 Fix in v2 Proof
evict only tombstones the value; state vector grew unboundedly Proper state + edge reclamation with cascade-evict via parent edge splicing scripts/compare_v1_v2.py: v1 grows to 10 001 state slots over 10 k cycles; v2 stays at 2
64-bit block symbol with no collision check (upper 64 bits of Murmur3 were discarded) Full Symbol128 { lo, hi }; both halves participate in equality test_128bit_hash_distinguishes_across_many_blocks — 100 k distinct blocks, 0 collisions
Docs claimed O(1) per block, code did O(E) linear scan Honest complexity docs + per-state HashMap index above EDGE_SCAN_THRESHOLD = 8 hash_index_kicks_in_above_threshold test
"Lock-free" claimed; no atomics, no single-writer enforcement Rust writer_guard + Python threading.Lock wrapper test_concurrent_writers_under_lock_do_not_corrupt (4 threads × 200 cycles)

Quick results

Experiment Result
14 Rust unit tests + 17 Python regression tests 31 / 31 pass
v1 vs v2 churn (10 000 insert/evict cycles) v1: 10 001 state slots (leak); v2: 2 (bounded)
Real-LLM correctness parity (Qwen2.5-7B tokenizer, 400 multi-turn prompts) 0 divergences from reference across 537 ops
30-min soak, 8 M cache ops RSS 785.8 → 785.9 MB, state_liveness 1.0 throughout
vs. Python-dict reference, cache microbench BCR match_prefix p50 3.2 µs (Python dict: 4.9 µs)

Full numbers in TECHNICAL_NOTE.md §2–§3.

Build & test

Requires Rust 1.75+, Python 3.9+, and maturin.

cd bcr_cache

# Rust core tests (no Python interpreter linking)
PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 cargo test --lib --no-default-features

# Build and install the Python wheel
PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 maturin build --release
pip install --force-reinstall target/wheels/bcr_cache-*.whl

# Regression tests
pytest tests/

# The headline empirical demonstration
python scripts/compare_v1_v2.py --cycles 10000

The PYO3_USE_ABI3_FORWARD_COMPATIBILITY flag is only needed on Python ≥ 3.13. We pinned pyo3 0.20 for ABI parity with v1.

Repo layout

bcr_cache/
├── Cargo.toml
├── pyproject.toml
├── src/
│   ├── core.rs           # pure-Rust BCRGraph (no PyO3; unit-testable standalone)
│   └── lib.rs            # thin PyO3 facade over core
├── python/bcr_cache/     # threading.Lock wrapper + SGLangBCRCache adapter
├── tests/                # Python regression suite (incl. v1-bug reproductions)
└── scripts/
    └── compare_v1_v2.py  # side-by-side churn comparison

experiments/
├── prompts.py            # synthetic chat corpus with prefix reuse
├── reference_cache.py    # pure-Python reference cache for differential testing
├── exp1_correctness_parity.py
├── exp2_soak.py
├── exp3_throughput.py
├── plot_soak.py
├── serving_bench.sh      # GCP/L4 real-LLM harness for SGLang + vLLM
├── results/              # all JSON + CSV from the runs described in TECHNICAL_NOTE.md
└── README.md             # reproduction recipe

TECHNICAL_NOTE.md         # honest write-up with the state-of-the-art comparison
CHANGELOG.md
LICENSE                   # dual MIT / Apache-2.0

Python usage

from bcr_cache import BCRCache, SGLangBCRCache

# Direct use
c = BCRCache(block_size=16)
c.insert(token_ids, block_values)     # tokens already tokenized
matched_len, matched_vals = c.match_prefix(token_ids)
c.evict(some_block_value)
c.assert_invariants()                  # runtime structural check

# SGLang BasePrefixCache adapter
cache = SGLangBCRCache(block_size=16)
cache.insert(token_ids, block_values)
cache.acquire([v1, v2]); cache.release([v1, v2])

What this repo is not

  • Not a replacement for SGLang's RadixCache in production. On the state-of-the-art comparison we ran (Qwen2.5-7B, L4, SGLang 0.5.10), RadixCache delivers a 21× TTFT win on cache-friendly traffic and does not leak under sustained load. BCR-memory-2 would need an actual SGLang scheduler integration and multi-model validation before it could claim to compete. See TECHNICAL_NOTE.md §3.
  • Not a paper. See §4 of the technical note for why we did not pursue that direction.

License

Dual MIT / Apache-2.0 — see LICENSE.

Citation

@software{bcr_memory_2_2026,
  author = {Povilionis, Armanas},
  title  = {BCR-memory-2: a correctness-fixed block-compressed prefix cache},
  year   = {2026}
}

About

Correctness-fixed Rust/PyO3 flat-array DFA prefix cache — rewrite of BCR-memory v1 with regression tests for four bugs and an SGLang/vLLM head-to-head harness.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors