fix(pymllm): fix RadixCache evictable size accounting bug#656
fix(pymllm): fix RadixCache evictable size accounting bug#656FarmersWrap wants to merge 4 commits intoUbiquitousLearning:mainfrom
Conversation
…k/unlock _add_leaf subtracted parent tokens from _evictable_size when a leaf gained its first child, but _split_node did not make the same adjustment. This inconsistency caused inc_lock_ref / dec_lock_ref (which transfer tokens between evictable and protected counters) to drive _evictable_size negative -- breaking the invariant that evictable + protected == total tree tokens. The fix removes the subtraction in _add_leaf. Internal nodes are still reclaimable via cascade eviction (evict children first, then the childless parent cascades), so their tokens correctly belong in _evictable_size. Also adds 51 pytest tests covering insert, match, split, evict, lock/unlock ref counting, namespace isolation, page-aligned ops, SWA mode, pool integration, pool leak detection, and the size invariant. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughRemoved parent-size decrementing logic in RadixCache Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
… comment Expand the _add_leaf comment to explain why the subtraction was removed: the original code was inconsistent with _split_node and broke the evictable+protected invariant used by inc/dec_lock_ref. Add a safety guard in alloc_kv_with_eviction: if evict() frees 0 tokens despite evictable_size > 0, break the loop instead of spinning. This prevents a theoretical infinite retry if size accounting ever drifts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Why the original
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pymllm/mem_cache/radix_cache.py`:
- Around line 642-649: MambaRadixCache's accounting is still subtracting
len(parent.key) when a node gains its first child, causing evictable totals to
diverge from RadixCache; update the MambaRadixCache implementation (the
MambaRadixCache class code that currently performs the len(parent.key)
subtraction when attaching the first child) to stop subtracting the parent's key
length so it follows the same evictable + protected == total tokens invariant as
RadixCache, and add/update a unit test asserting both caches produce the same
evictable totals for the same tree shape.
In `@pymllm/tests/test_radix_cache.py`:
- Around line 28-35: The helper _make_cache currently always constructs a
TokenToKVPoolAllocator and passes it into RadixCache, which causes tests that
insert synthetic tensors (via _val(...)) to call pool.free() on non-allocated
values and corrupt allocator state; change _make_cache to accept
token_to_kv_pool_allocator=None by default (remove internal
TokenToKVPoolAllocator creation) and pass that argument through to RadixCache so
generic cache tests use no pool, and only the dedicated allocator tests
explicitly create and pass a TokenToKVPoolAllocator (using
pool.alloc()/pool.free() there).
- Around line 402-410: Replace the weak assertion in test_swa_tombstone_eviction
with a concrete postcondition: call cache.evict(0, swa_num_tokens=3) (as already
done) and assert that result.swa_evicted >= 1 to ensure at least one 3-token SWA
candidate was reclaimed; optionally also assert that cache._swa_evictable_size
decreased from its pre-eviction value to guard accounting, referencing
test_swa_tombstone_eviction, cache.evict, result.swa_evicted and the internal
_swa_evictable_size for locating the relevant checks.
- Around line 759-770: The test currently asserts that dec_lock_ref drives
lock_ref negative, which is unsafe; update the code to prevent negative counters
or change the test to expect that behavior is disallowed: either (A) harden
RadixCache.dec_lock_ref to guard against underflow by returning early/no-op when
node.lock_ref == 0 (adjust RadixCache.dec_lock_ref implementation so it never
decrements below 0 and preserves evictable/protected size accounting), or (B) if
you cannot change production code now, mark the test
test_dec_without_inc_goes_negative as xfail (or skip) to reflect that negative
lock_ref is not part of the contract; refer to RadixCache.dec_lock_ref,
RadixCache.inc_lock_ref, and the test function name to locate the changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bcd0f237-faf4-426e-9376-2f74558c6c2f
📒 Files selected for processing (2)
pymllm/mem_cache/radix_cache.pypymllm/tests/test_radix_cache.py
Add TestConcurrentRequestsSharedPrefix with 9 test methods simulating real serving scenarios: shared system prompts, staggered arrivals, memory pressure eviction, deep conversation trees, and interleaved prefill/decode with tree splits on locked nodes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (3)
pymllm/tests/test_radix_cache.py (3)
28-35:⚠️ Potential issue | 🟠 Major
_make_cache()should not always attach a live pool for generic logic tests.With synthetic
_val(...)inserts, eviction paths can callpool.free()on indices never allocated by that pool, which can mask correctness issues.Suggested helper adjustment
-def _make_cache(pool_size=256, page_size=1, sliding_window_size=None, on_node_evict=None): - pool = TokenToKVPoolAllocator(size=pool_size, device="cpu", page_size=page_size) +def _make_cache(pool_size=None, page_size=1, sliding_window_size=None, on_node_evict=None): + pool = ( + TokenToKVPoolAllocator(size=pool_size, device="cpu", page_size=page_size) + if pool_size is not None + else None + ) return RadixCache( page_size=page_size, token_to_kv_pool_allocator=pool, sliding_window_size=sliding_window_size, on_node_evict=on_node_evict, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pymllm/tests/test_radix_cache.py` around lines 28 - 35, The helper _make_cache currently always creates a live TokenToKVPoolAllocator and passes it into RadixCache, which can cause tests that synthesize inserts to trigger pool.free on indices the pool never allocated; change _make_cache to allow omitting the real pool (e.g., add a parameter use_pool=True or pool=None) and when use_pool is False pass token_to_kv_pool_allocator=None or a lightweight dummy allocator with no-op free/allocate methods, so eviction paths in RadixCache (and functions that call pool.free) won’t call into a real pool for synthetic tests; update calls in tests that need the real pool to pass use_pool=True or supply an explicit TokenToKVPoolAllocator.
402-410:⚠️ Potential issue | 🟡 Minor
assert result.swa_evicted >= 0is non-diagnostic.This assertion always passes and won’t catch SWA tombstone/accounting regressions. Assert a concrete positive reclaim or post-eviction size delta.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pymllm/tests/test_radix_cache.py` around lines 402 - 410, The test_swa_tombstone_eviction currently uses a vacuous assertion (assert result.swa_evicted >= 0); change it to assert a concrete positive reclaim by either asserting result.swa_evicted > 0 or by measuring cache size before and after eviction (call cache.size() or inspect relevant accounting) and asserting a decrease in SWA-specific bytes/entries; update references in the test to use cache.evict(0, swa_num_tokens=3) result.swa_evicted and/or pre_eviction = cache.get_swa_size() / post_eviction = cache.get_swa_size() and assert pre_eviction > post_eviction so the test fails on regressions.
759-770:⚠️ Potential issue | 🟠 MajorPlease avoid treating negative
lock_refas expected behavior.This test currently encodes lock underflow (
-1) as valid, which normalizes corrupted accounting state. Prefer rejecting unmatched decrements (or mark as xfail untildec_lock_refis hardened).Safer expected behavior once hardened
- def test_dec_without_inc_goes_negative(self): - """Verify behavior when dec_lock_ref is called without matching inc. - This documents whether negative lock_ref causes issues.""" + def test_dec_without_inc_is_rejected(self): + """dec_lock_ref should reject unmatched decrements.""" cache = _make_cache() cache.insert(_key([1, 2, 3]), _val([10, 20, 30])) r = cache.match_prefix(_key([1, 2, 3])) - - # dec without inc — lock_ref goes to -1 - cache.dec_lock_ref(r.last_node) - # lock_ref is now -1, evictable_size and protected_size may be inconsistent - # This is a potential bug: negative lock_ref means the node is "super evictable" - assert r.last_node.lock_ref == -1 + with pytest.raises(ValueError): + cache.dec_lock_ref(r.last_node)As per coding guidelines, "Validate inputs for public APIs and critical internal functions."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pymllm/tests/test_radix_cache.py` around lines 759 - 770, The test currently treats a lock_ref underflow (-1) as acceptable; change it to assert that unmatched decrements are rejected: update test_dec_without_inc_goes_negative to call cache.dec_lock_ref(r.last_node) and expect a safe behavior (preferably an exception such as ValueError) or that r.last_node.lock_ref remains >= 0 (no underflow) and that protected_size/evictable_size stay consistent; to support this either harden dec_lock_ref to raise on lock_ref == 0 or clamp/prevent decrement and adjust evictable/protected accounting accordingly — refer to dec_lock_ref and r.last_node.lock_ref in your changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pymllm/tests/test_radix_cache.py`:
- Around line 28-35: The helper _make_cache currently always creates a live
TokenToKVPoolAllocator and passes it into RadixCache, which can cause tests that
synthesize inserts to trigger pool.free on indices the pool never allocated;
change _make_cache to allow omitting the real pool (e.g., add a parameter
use_pool=True or pool=None) and when use_pool is False pass
token_to_kv_pool_allocator=None or a lightweight dummy allocator with no-op
free/allocate methods, so eviction paths in RadixCache (and functions that call
pool.free) won’t call into a real pool for synthetic tests; update calls in
tests that need the real pool to pass use_pool=True or supply an explicit
TokenToKVPoolAllocator.
- Around line 402-410: The test_swa_tombstone_eviction currently uses a vacuous
assertion (assert result.swa_evicted >= 0); change it to assert a concrete
positive reclaim by either asserting result.swa_evicted > 0 or by measuring
cache size before and after eviction (call cache.size() or inspect relevant
accounting) and asserting a decrease in SWA-specific bytes/entries; update
references in the test to use cache.evict(0, swa_num_tokens=3)
result.swa_evicted and/or pre_eviction = cache.get_swa_size() / post_eviction =
cache.get_swa_size() and assert pre_eviction > post_eviction so the test fails
on regressions.
- Around line 759-770: The test currently treats a lock_ref underflow (-1) as
acceptable; change it to assert that unmatched decrements are rejected: update
test_dec_without_inc_goes_negative to call cache.dec_lock_ref(r.last_node) and
expect a safe behavior (preferably an exception such as ValueError) or that
r.last_node.lock_ref remains >= 0 (no underflow) and that
protected_size/evictable_size stay consistent; to support this either harden
dec_lock_ref to raise on lock_ref == 0 or clamp/prevent decrement and adjust
evictable/protected accounting accordingly — refer to dec_lock_ref and
r.last_node.lock_ref in your changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5b12e760-9806-448d-95be-1d14d2dabcee
📒 Files selected for processing (3)
pymllm/mem_cache/radix_cache.pypymllm/orchestrator/model_runner_process.pypymllm/tests/test_radix_cache.py
✅ Files skipped from review due to trivial changes (1)
- pymllm/mem_cache/radix_cache.py
…ing#656 1. Fix same _evictable_size bug in MambaRadixCache._add_leaf (was still subtracting parent tokens when leaf gains first child). 2. Make _make_cache() default to pool=None so generic tests don't corrupt allocator state with synthetic _val() tensors. 3. Strengthen SWA tombstone eviction assertion to assert exact count. 4. Rewrite negative lock_ref test to document the real corruption: unmatched dec makes subsequent inc skip the 0->1 protection transfer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (4)
pymllm/tests/test_radix_cache.py (4)
929-930: Consider making class constant immutable.
SYSTEM_PROMPTis a mutable list used as a class attribute. While it's only read in these tests, converting to a tuple prevents accidental mutation and satisfies static analysis (RUF012).Suggested fix
- SYSTEM_PROMPT = list(range(1000, 1050)) # 50-token system prompt - SYSTEM_PROMPT_LEN = 50 + SYSTEM_PROMPT = tuple(range(1000, 1050)) # 50-token system prompt + SYSTEM_PROMPT_LEN = 50Then update usages that need a list (e.g., line 1124) to convert explicitly:
prefix = list(self.SYSTEM_PROMPT)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pymllm/tests/test_radix_cache.py` around lines 929 - 930, Change the mutable class attribute SYSTEM_PROMPT to an immutable tuple (and keep SYSTEM_PROMPT_LEN) to prevent accidental mutation and satisfy RUF012; then update any test sites that expect a list (e.g., where prefix is built from SYSTEM_PROMPT) to explicitly convert it using list(self.SYSTEM_PROMPT) or list(TestClass.SYSTEM_PROMPT) so behavior remains identical.
1250-1254: Unused loop variable should use_.The loop index
iis not used; convention is to use_for discarded values.Suggested fix
- for i in range(20): + for _ in range(20): r = cache.match_prefix(_key(seq)) assert r.prefix_len == len(seq) bid = cache.inc_lock_ref(r.last_node) locks.append((r.last_node, bid))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pymllm/tests/test_radix_cache.py` around lines 1250 - 1254, The for-loop uses an unused loop variable `i`; change it to `_` to follow convention and signal the value is intentionally ignored—update the loop header `for i in range(20):` to `for _ in range(20):` in the block that calls `cache.match_prefix(_key(seq))`, `cache.inc_lock_ref(r.last_node)`, and appends to `locks`.
1181-1183: Unused loop variables should use_prefix.
nodeandbidare not used in this loop body (onlyseqandifor the error message).Suggested fix
- for i, (seq, node, bid) in active.items(): + for i, (seq, _node, _bid) in active.items(): r = cache.match_prefix(_key(seq)) assert r.prefix_len == len(seq), f"Request {i} diverging at {20+i*10} lost"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pymllm/tests/test_radix_cache.py` around lines 1181 - 1183, In the loop that unpacks active.items() in pymllm/tests/test_radix_cache.py, replace the unused variables node and bid with names that use an underscore prefix (e.g., _node and _bid) so they indicate they are intentionally unused; update the line "for i, (seq, node, bid) in active.items():" to "for i, (seq, _node, _bid) in active.items():" while leaving the body (cache.match_prefix(_key(seq)) and the assert) unchanged.
763-786: Test documents known limitation appropriately.The rewritten test clearly documents the corruption scenario: an unmatched
dec_lock_refdriveslock_refnegative, causing subsequentinc_lock_refto fail the0→1protection transfer. The comments acknowledge this as a known limitation pending hardening.Consider adding a
# TODO:comment or linking to an issue for tracking the eventual fix indec_lock_refto guard against underflow. As per coding guidelines, "TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text."Suggested TODO addition
This documents a known limitation — dec_lock_ref should guard - against underflow. Hardening is a separate task.""" + against underflow. Hardening is a separate task. + # TODO: Harden dec_lock_ref to guard against underflow (issue `#XXX`)."""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pymllm/tests/test_radix_cache.py` around lines 763 - 786, Add a TODO comment to the test test_dec_without_inc_corrupts_lock_semantics noting that dec_lock_ref should guard against underflow and link to the issue/PR tracking the hardening; place the TODO near the unmatched dec_lock_ref call (reference dec_lock_ref, inc_lock_ref, lock_ref, evictable_size in the comment) and format it exactly as "TODO: <UTF-8 text>" so it follows the project's TODO/FIXME convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pymllm/tests/test_radix_cache.py`:
- Around line 929-930: Change the mutable class attribute SYSTEM_PROMPT to an
immutable tuple (and keep SYSTEM_PROMPT_LEN) to prevent accidental mutation and
satisfy RUF012; then update any test sites that expect a list (e.g., where
prefix is built from SYSTEM_PROMPT) to explicitly convert it using
list(self.SYSTEM_PROMPT) or list(TestClass.SYSTEM_PROMPT) so behavior remains
identical.
- Around line 1250-1254: The for-loop uses an unused loop variable `i`; change
it to `_` to follow convention and signal the value is intentionally
ignored—update the loop header `for i in range(20):` to `for _ in range(20):` in
the block that calls `cache.match_prefix(_key(seq))`,
`cache.inc_lock_ref(r.last_node)`, and appends to `locks`.
- Around line 1181-1183: In the loop that unpacks active.items() in
pymllm/tests/test_radix_cache.py, replace the unused variables node and bid with
names that use an underscore prefix (e.g., _node and _bid) so they indicate they
are intentionally unused; update the line "for i, (seq, node, bid) in
active.items():" to "for i, (seq, _node, _bid) in active.items():" while leaving
the body (cache.match_prefix(_key(seq)) and the assert) unchanged.
- Around line 763-786: Add a TODO comment to the test
test_dec_without_inc_corrupts_lock_semantics noting that dec_lock_ref should
guard against underflow and link to the issue/PR tracking the hardening; place
the TODO near the unmatched dec_lock_ref call (reference dec_lock_ref,
inc_lock_ref, lock_ref, evictable_size in the comment) and format it exactly as
"TODO: <UTF-8 text>" so it follows the project's TODO/FIXME convention.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1c86e78d-0a28-4e78-813b-7fe69e64448c
📒 Files selected for processing (2)
pymllm/mem_cache/mamba_radix_cache.pypymllm/tests/test_radix_cache.py
Summary
_evictable_sizegoing negative during lock/unlock operations inRadixCache._add_leafsubtracted parent tokens when a leaf gained its first child, but_split_nodedid not make the same adjustment. This broke the invariantevictable + protected == total tree tokens, causing_evictable_sizeto go negative (-3, -5 observed) and potentially leading to incorrect scheduler capacity decisions on memory-constrained devices (e.g. Jetson).RadixCachecovering insert, match, split, evict, lock/unlock ref counting, namespace isolation, page-aligned ops, SWA mode, pool integration, pool leak detection, and the size invariant. All tests run on CPU in <1s -- no GPU required.Test plan
pytest pymllm/tests/test_radix_cache.py-- 60 tests, all passingSummary by CodeRabbit
Bug Fixes
Tests