BCR-memory-2

A correctness-first Rust reimplementation of the BCR-memory v1 prefix cache, with regression tests for every bug we found in the original.

One-sentence summary. The flat-array DFA idea is sound; the v1 implementation had four concrete bugs that made it unsafe under sustained churn. This repo fixes them, proves each fix with a dedicated test, and documents what we learned comparing the result to production prefix caches. See TECHNICAL_NOTE.md for the long version.

What's fixed vs. v1

Bug in v1	Fix in v2	Proof
`evict` only tombstones the value; state vector grew unboundedly	Proper state + edge reclamation with cascade-evict via parent edge splicing	`scripts/compare_v1_v2.py`: v1 grows to 10 001 state slots over 10 k cycles; v2 stays at 2
64-bit block symbol with no collision check (upper 64 bits of Murmur3 were discarded)	Full `Symbol128 { lo, hi }`; both halves participate in equality	`test_128bit_hash_distinguishes_across_many_blocks` — 100 k distinct blocks, 0 collisions
Docs claimed O(1) per block, code did O(E) linear scan	Honest complexity docs + per-state `HashMap` index above `EDGE_SCAN_THRESHOLD = 8`	`hash_index_kicks_in_above_threshold` test
"Lock-free" claimed; no atomics, no single-writer enforcement	Rust `writer_guard` + Python `threading.Lock` wrapper	`test_concurrent_writers_under_lock_do_not_corrupt` (4 threads × 200 cycles)

Quick results

Experiment	Result
14 Rust unit tests + 17 Python regression tests	31 / 31 pass
v1 vs v2 churn (10 000 insert/evict cycles)	v1: 10 001 state slots (leak); v2: 2 (bounded)
Real-LLM correctness parity (Qwen2.5-7B tokenizer, 400 multi-turn prompts)	0 divergences from reference across 537 ops
30-min soak, 8 M cache ops	RSS 785.8 → 785.9 MB, state_liveness 1.0 throughout
vs. Python-dict reference, cache microbench	BCR `match_prefix` p50 3.2 µs (Python dict: 4.9 µs)

Full numbers in TECHNICAL_NOTE.md §2–§3.

Build & test

Requires Rust 1.75+, Python 3.9+, and maturin.

cd bcr_cache

# Rust core tests (no Python interpreter linking)
PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 cargo test --lib --no-default-features

# Build and install the Python wheel
PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 maturin build --release
pip install --force-reinstall target/wheels/bcr_cache-*.whl

# Regression tests
pytest tests/

# The headline empirical demonstration
python scripts/compare_v1_v2.py --cycles 10000

The PYO3_USE_ABI3_FORWARD_COMPATIBILITY flag is only needed on Python ≥ 3.13. We pinned pyo3 0.20 for ABI parity with v1.

Repo layout

bcr_cache/
├── Cargo.toml
├── pyproject.toml
├── src/
│   ├── core.rs           # pure-Rust BCRGraph (no PyO3; unit-testable standalone)
│   └── lib.rs            # thin PyO3 facade over core
├── python/bcr_cache/     # threading.Lock wrapper + SGLangBCRCache adapter
├── tests/                # Python regression suite (incl. v1-bug reproductions)
└── scripts/
    └── compare_v1_v2.py  # side-by-side churn comparison

experiments/
├── prompts.py            # synthetic chat corpus with prefix reuse
├── reference_cache.py    # pure-Python reference cache for differential testing
├── exp1_correctness_parity.py
├── exp2_soak.py
├── exp3_throughput.py
├── plot_soak.py
├── serving_bench.sh      # GCP/L4 real-LLM harness for SGLang + vLLM
├── results/              # all JSON + CSV from the runs described in TECHNICAL_NOTE.md
└── README.md             # reproduction recipe

TECHNICAL_NOTE.md         # honest write-up with the state-of-the-art comparison
CHANGELOG.md
LICENSE                   # dual MIT / Apache-2.0

Python usage

from bcr_cache import BCRCache, SGLangBCRCache

# Direct use
c = BCRCache(block_size=16)
c.insert(token_ids, block_values)     # tokens already tokenized
matched_len, matched_vals = c.match_prefix(token_ids)
c.evict(some_block_value)
c.assert_invariants()                  # runtime structural check

# SGLang BasePrefixCache adapter
cache = SGLangBCRCache(block_size=16)
cache.insert(token_ids, block_values)
cache.acquire([v1, v2]); cache.release([v1, v2])

What this repo is not

Not a replacement for SGLang's RadixCache in production. On the state-of-the-art comparison we ran (Qwen2.5-7B, L4, SGLang 0.5.10), RadixCache delivers a 21× TTFT win on cache-friendly traffic and does not leak under sustained load. BCR-memory-2 would need an actual SGLang scheduler integration and multi-model validation before it could claim to compete. See TECHNICAL_NOTE.md §3.
Not a paper. See §4 of the technical note for why we did not pursue that direction.

License

Dual MIT / Apache-2.0 — see LICENSE.

Citation

@software{bcr_memory_2_2026,
  author = {Povilionis, Armanas},
  title  = {BCR-memory-2: a correctness-fixed block-compressed prefix cache},
  year   = {2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BCR-memory-2

What's fixed vs. v1

Quick results

Build & test

Repo layout

Python usage

What this repo is not

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
bcr_cache		bcr_cache
experiments		experiments
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
TECHNICAL_NOTE.md		TECHNICAL_NOTE.md

Folders and files

Latest commit

History

Repository files navigation

BCR-memory-2

What's fixed vs. v1

Quick results

Build & test

Repo layout

Python usage

What this repo is not

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages