Skip to content

fix(pymllm): fix RadixCache evictable size accounting bug#656

Open
FarmersWrap wants to merge 4 commits intoUbiquitousLearning:mainfrom
FarmersWrap:fix/radix-cache-correctness
Open

fix(pymllm): fix RadixCache evictable size accounting bug#656
FarmersWrap wants to merge 4 commits intoUbiquitousLearning:mainfrom
FarmersWrap:fix/radix-cache-correctness

Conversation

@FarmersWrap
Copy link

@FarmersWrap FarmersWrap commented Mar 20, 2026

Summary

  • Fix _evictable_size going negative during lock/unlock operations in RadixCache. _add_leaf subtracted parent tokens when a leaf gained its first child, but _split_node did not make the same adjustment. This broke the invariant evictable + protected == total tree tokens, causing _evictable_size to go negative (-3, -5 observed) and potentially leading to incorrect scheduler capacity decisions on memory-constrained devices (e.g. Jetson).
  • Add 51 pytest tests for RadixCache covering insert, match, split, evict, lock/unlock ref counting, namespace isolation, page-aligned ops, SWA mode, pool integration, pool leak detection, and the size invariant. All tests run on CPU in <1s -- no GPU required.

Test plan

  • pytest pymllm/tests/test_radix_cache.py -- 60 tests, all passing
  • Verify no regression in serving throughput (run pymllm server with radix cache enabled)

Summary by CodeRabbit

  • Bug Fixes

    • Fixed cache memory accounting to avoid incorrect evictable/protected size adjustments during node updates.
    • Added a guard in allocation-with-eviction to stop futile retry loops when eviction frees no resources and return gracefully.
  • Tests

    • Added a comprehensive CPU-only test suite validating radix-cache behavior: inserts, prefix matching, splits, locking, eviction, size invariants, allocator interactions, and SWA scenarios.

…k/unlock

_add_leaf subtracted parent tokens from _evictable_size when a leaf
gained its first child, but _split_node did not make the same
adjustment. This inconsistency caused inc_lock_ref / dec_lock_ref
(which transfer tokens between evictable and protected counters) to
drive _evictable_size negative -- breaking the invariant that
evictable + protected == total tree tokens.

The fix removes the subtraction in _add_leaf. Internal nodes are still
reclaimable via cascade eviction (evict children first, then the
childless parent cascades), so their tokens correctly belong in
_evictable_size.

Also adds 51 pytest tests covering insert, match, split, evict,
lock/unlock ref counting, namespace isolation, page-aligned ops, SWA
mode, pool integration, pool leak detection, and the size invariant.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

Removed parent-size decrementing logic in RadixCache _add_leaf, added a large CPU-only RadixCache test suite, and introduced a guard in the model runner's KV allocation retry loop to abort when eviction reports no progress despite available evictable tokens.

Changes

Cohort / File(s) Summary
Radix cache logic
pymllm/mem_cache/radix_cache.py, pymllm/mem_cache/mamba_radix_cache.py
Removed conditional that decremented evictable/full-evictable counts when converting an internal node from leaf-like to internal in _add_leaf. Parent token accounting is intentionally not adjusted at this point; only the new leaf increments are applied.
New tests (CPU-only)
pymllm/tests/test_radix_cache.py
Added comprehensive end-to-end tests for RadixCache covering insert/match/split semantics, size accounting invariants, lock/ref-count behavior, eviction flows (including SWA), pool allocator interactions, page-alignment, reset semantics, and multiple edge cases and lifecycle scenarios.
Allocator eviction guard
pymllm/orchestrator/model_runner_process.py
In _alloc_kv_with_eviction, after calling cache.evict(evict_target), if full_evicted == 0 while evictable_size > 0, the code now logs a warning and returns None instead of retrying further.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I burrowed through leaves and keys so fine,
Split branches neatly, counted every line.
Tests hopped in sequence, eviction kept pace,
Allocation warned — no more running the race.
A rabbit nods: the cache finds its place.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.07% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly identifies the main issue being fixed: a RadixCache evictable size accounting bug, which is the primary focus across all file changes.
Description check ✅ Passed The PR description includes a comprehensive summary of the bug fix and test additions with clear technical explanations and a test plan, though it references an external guidelines document without full detail.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@FarmersWrap FarmersWrap marked this pull request as draft March 20, 2026 23:48
… comment

Expand the _add_leaf comment to explain why the subtraction was removed:
the original code was inconsistent with _split_node and broke the
evictable+protected invariant used by inc/dec_lock_ref.

Add a safety guard in alloc_kv_with_eviction: if evict() frees 0 tokens
despite evictable_size > 0, break the loop instead of spinning.  This
prevents a theoretical infinite retry if size accounting ever drifts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@FarmersWrap
Copy link
Author

Why the original _add_leaf subtraction was added and why it's incorrect

Context: the original fix (commit 31b0ff9)

The author observed that when a leaf node gains its first child, evictable_size() counts the parent's tokens even though _collect_evictable_leaves() won't find it directly (it has children now). They added a subtraction in _add_leaf to make evictable_size() reflect "directly evictable leaf tokens only":

if (
    len(parent.children) == 0
    and parent != self.root_node
    and parent.lock_ref == 0
    and not parent.evicted
):
    self._evictable_size -= len(parent.key)

Why the subtraction is wrong (3 reasons)

  1. Inconsistency with _split_node_split_node also creates internal nodes (by splitting a leaf into a parent + child), but does NOT subtract the new parent's tokens from _evictable_size. Same semantic concept (leaf → internal), different handling. This means _evictable_size has two different semantics depending on how the tree was built.

  2. Breaks inc_lock_ref / dec_lock_ref — These methods assume ALL non-root tokens are partitioned into evictable + protected. When locking walks up the tree and encounters a node whose tokens were already removed by _add_leaf, it subtracts from _evictable_size again, driving it negative (-3, -5 observed in tests). This violates the invariant evictable + protected == total tree tokens.

  3. Internal nodes ARE reclaimableevict() uses cascade eviction: after evicting all children of a node, the now-childless parent is pushed onto the eviction heap and evicted too. So evictable_size() should include internal node tokens, because they are genuinely reclaimable.

Proof that removing the subtraction is safe

The concern was: could evictable_size() > 0 while evict() can't actually free anything, causing the eviction loop in alloc_kv_with_eviction to spin?

No, because:

  • If an internal node's children are evictable (unlocked) → cascade eviction works: evict children first, parent becomes childless, cascade evicts parent.
  • If an internal node's children are lockedinc_lock_ref already walked up from the locked leaf and subtracted the parent's tokens from _evictable_size. So evictable_size() won't count them.
  • There is no scenario where evictable_size() > 0 but nothing is reclaimable.

Safety guard added

As an extra precaution, a guard was added in alloc_kv_with_eviction (model_runner_process.py): if evict() frees 0 tokens despite evictable_size() > 0, the loop breaks with a warning instead of spinning. This covers any theoretical future scenario where the counters could drift.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pymllm/mem_cache/radix_cache.py`:
- Around line 642-649: MambaRadixCache's accounting is still subtracting
len(parent.key) when a node gains its first child, causing evictable totals to
diverge from RadixCache; update the MambaRadixCache implementation (the
MambaRadixCache class code that currently performs the len(parent.key)
subtraction when attaching the first child) to stop subtracting the parent's key
length so it follows the same evictable + protected == total tokens invariant as
RadixCache, and add/update a unit test asserting both caches produce the same
evictable totals for the same tree shape.

In `@pymllm/tests/test_radix_cache.py`:
- Around line 28-35: The helper _make_cache currently always constructs a
TokenToKVPoolAllocator and passes it into RadixCache, which causes tests that
insert synthetic tensors (via _val(...)) to call pool.free() on non-allocated
values and corrupt allocator state; change _make_cache to accept
token_to_kv_pool_allocator=None by default (remove internal
TokenToKVPoolAllocator creation) and pass that argument through to RadixCache so
generic cache tests use no pool, and only the dedicated allocator tests
explicitly create and pass a TokenToKVPoolAllocator (using
pool.alloc()/pool.free() there).
- Around line 402-410: Replace the weak assertion in test_swa_tombstone_eviction
with a concrete postcondition: call cache.evict(0, swa_num_tokens=3) (as already
done) and assert that result.swa_evicted >= 1 to ensure at least one 3-token SWA
candidate was reclaimed; optionally also assert that cache._swa_evictable_size
decreased from its pre-eviction value to guard accounting, referencing
test_swa_tombstone_eviction, cache.evict, result.swa_evicted and the internal
_swa_evictable_size for locating the relevant checks.
- Around line 759-770: The test currently asserts that dec_lock_ref drives
lock_ref negative, which is unsafe; update the code to prevent negative counters
or change the test to expect that behavior is disallowed: either (A) harden
RadixCache.dec_lock_ref to guard against underflow by returning early/no-op when
node.lock_ref == 0 (adjust RadixCache.dec_lock_ref implementation so it never
decrements below 0 and preserves evictable/protected size accounting), or (B) if
you cannot change production code now, mark the test
test_dec_without_inc_goes_negative as xfail (or skip) to reflect that negative
lock_ref is not part of the contract; refer to RadixCache.dec_lock_ref,
RadixCache.inc_lock_ref, and the test function name to locate the changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bcd0f237-faf4-426e-9376-2f74558c6c2f

📥 Commits

Reviewing files that changed from the base of the PR and between 05c75f8 and 1c2696f.

📒 Files selected for processing (2)
  • pymllm/mem_cache/radix_cache.py
  • pymllm/tests/test_radix_cache.py

Add TestConcurrentRequestsSharedPrefix with 9 test methods simulating
real serving scenarios: shared system prompts, staggered arrivals,
memory pressure eviction, deep conversation trees, and interleaved
prefill/decode with tree splits on locked nodes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@FarmersWrap FarmersWrap marked this pull request as ready for review March 21, 2026 00:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
pymllm/tests/test_radix_cache.py (3)

28-35: ⚠️ Potential issue | 🟠 Major

_make_cache() should not always attach a live pool for generic logic tests.

With synthetic _val(...) inserts, eviction paths can call pool.free() on indices never allocated by that pool, which can mask correctness issues.

Suggested helper adjustment
-def _make_cache(pool_size=256, page_size=1, sliding_window_size=None, on_node_evict=None):
-    pool = TokenToKVPoolAllocator(size=pool_size, device="cpu", page_size=page_size)
+def _make_cache(pool_size=None, page_size=1, sliding_window_size=None, on_node_evict=None):
+    pool = (
+        TokenToKVPoolAllocator(size=pool_size, device="cpu", page_size=page_size)
+        if pool_size is not None
+        else None
+    )
     return RadixCache(
         page_size=page_size,
         token_to_kv_pool_allocator=pool,
         sliding_window_size=sliding_window_size,
         on_node_evict=on_node_evict,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pymllm/tests/test_radix_cache.py` around lines 28 - 35, The helper
_make_cache currently always creates a live TokenToKVPoolAllocator and passes it
into RadixCache, which can cause tests that synthesize inserts to trigger
pool.free on indices the pool never allocated; change _make_cache to allow
omitting the real pool (e.g., add a parameter use_pool=True or pool=None) and
when use_pool is False pass token_to_kv_pool_allocator=None or a lightweight
dummy allocator with no-op free/allocate methods, so eviction paths in
RadixCache (and functions that call pool.free) won’t call into a real pool for
synthetic tests; update calls in tests that need the real pool to pass
use_pool=True or supply an explicit TokenToKVPoolAllocator.

402-410: ⚠️ Potential issue | 🟡 Minor

assert result.swa_evicted >= 0 is non-diagnostic.

This assertion always passes and won’t catch SWA tombstone/accounting regressions. Assert a concrete positive reclaim or post-eviction size delta.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pymllm/tests/test_radix_cache.py` around lines 402 - 410, The
test_swa_tombstone_eviction currently uses a vacuous assertion (assert
result.swa_evicted >= 0); change it to assert a concrete positive reclaim by
either asserting result.swa_evicted > 0 or by measuring cache size before and
after eviction (call cache.size() or inspect relevant accounting) and asserting
a decrease in SWA-specific bytes/entries; update references in the test to use
cache.evict(0, swa_num_tokens=3) result.swa_evicted and/or pre_eviction =
cache.get_swa_size() / post_eviction = cache.get_swa_size() and assert
pre_eviction > post_eviction so the test fails on regressions.

759-770: ⚠️ Potential issue | 🟠 Major

Please avoid treating negative lock_ref as expected behavior.

This test currently encodes lock underflow (-1) as valid, which normalizes corrupted accounting state. Prefer rejecting unmatched decrements (or mark as xfail until dec_lock_ref is hardened).

Safer expected behavior once hardened
-    def test_dec_without_inc_goes_negative(self):
-        """Verify behavior when dec_lock_ref is called without matching inc.
-        This documents whether negative lock_ref causes issues."""
+    def test_dec_without_inc_is_rejected(self):
+        """dec_lock_ref should reject unmatched decrements."""
         cache = _make_cache()
         cache.insert(_key([1, 2, 3]), _val([10, 20, 30]))
         r = cache.match_prefix(_key([1, 2, 3]))
-
-        # dec without inc — lock_ref goes to -1
-        cache.dec_lock_ref(r.last_node)
-        # lock_ref is now -1, evictable_size and protected_size may be inconsistent
-        # This is a potential bug: negative lock_ref means the node is "super evictable"
-        assert r.last_node.lock_ref == -1
+        with pytest.raises(ValueError):
+            cache.dec_lock_ref(r.last_node)

As per coding guidelines, "Validate inputs for public APIs and critical internal functions."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pymllm/tests/test_radix_cache.py` around lines 759 - 770, The test currently
treats a lock_ref underflow (-1) as acceptable; change it to assert that
unmatched decrements are rejected: update test_dec_without_inc_goes_negative to
call cache.dec_lock_ref(r.last_node) and expect a safe behavior (preferably an
exception such as ValueError) or that r.last_node.lock_ref remains >= 0 (no
underflow) and that protected_size/evictable_size stay consistent; to support
this either harden dec_lock_ref to raise on lock_ref == 0 or clamp/prevent
decrement and adjust evictable/protected accounting accordingly — refer to
dec_lock_ref and r.last_node.lock_ref in your changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pymllm/tests/test_radix_cache.py`:
- Around line 28-35: The helper _make_cache currently always creates a live
TokenToKVPoolAllocator and passes it into RadixCache, which can cause tests that
synthesize inserts to trigger pool.free on indices the pool never allocated;
change _make_cache to allow omitting the real pool (e.g., add a parameter
use_pool=True or pool=None) and when use_pool is False pass
token_to_kv_pool_allocator=None or a lightweight dummy allocator with no-op
free/allocate methods, so eviction paths in RadixCache (and functions that call
pool.free) won’t call into a real pool for synthetic tests; update calls in
tests that need the real pool to pass use_pool=True or supply an explicit
TokenToKVPoolAllocator.
- Around line 402-410: The test_swa_tombstone_eviction currently uses a vacuous
assertion (assert result.swa_evicted >= 0); change it to assert a concrete
positive reclaim by either asserting result.swa_evicted > 0 or by measuring
cache size before and after eviction (call cache.size() or inspect relevant
accounting) and asserting a decrease in SWA-specific bytes/entries; update
references in the test to use cache.evict(0, swa_num_tokens=3)
result.swa_evicted and/or pre_eviction = cache.get_swa_size() / post_eviction =
cache.get_swa_size() and assert pre_eviction > post_eviction so the test fails
on regressions.
- Around line 759-770: The test currently treats a lock_ref underflow (-1) as
acceptable; change it to assert that unmatched decrements are rejected: update
test_dec_without_inc_goes_negative to call cache.dec_lock_ref(r.last_node) and
expect a safe behavior (preferably an exception such as ValueError) or that
r.last_node.lock_ref remains >= 0 (no underflow) and that
protected_size/evictable_size stay consistent; to support this either harden
dec_lock_ref to raise on lock_ref == 0 or clamp/prevent decrement and adjust
evictable/protected accounting accordingly — refer to dec_lock_ref and
r.last_node.lock_ref in your changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5b12e760-9806-448d-95be-1d14d2dabcee

📥 Commits

Reviewing files that changed from the base of the PR and between 1c2696f and 0ac7ee0.

📒 Files selected for processing (3)
  • pymllm/mem_cache/radix_cache.py
  • pymllm/orchestrator/model_runner_process.py
  • pymllm/tests/test_radix_cache.py
✅ Files skipped from review due to trivial changes (1)
  • pymllm/mem_cache/radix_cache.py

…ing#656

1. Fix same _evictable_size bug in MambaRadixCache._add_leaf (was still
   subtracting parent tokens when leaf gains first child).
2. Make _make_cache() default to pool=None so generic tests don't corrupt
   allocator state with synthetic _val() tensors.
3. Strengthen SWA tombstone eviction assertion to assert exact count.
4. Rewrite negative lock_ref test to document the real corruption:
   unmatched dec makes subsequent inc skip the 0->1 protection transfer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
pymllm/tests/test_radix_cache.py (4)

929-930: Consider making class constant immutable.

SYSTEM_PROMPT is a mutable list used as a class attribute. While it's only read in these tests, converting to a tuple prevents accidental mutation and satisfies static analysis (RUF012).

Suggested fix
-    SYSTEM_PROMPT = list(range(1000, 1050))  # 50-token system prompt
-    SYSTEM_PROMPT_LEN = 50
+    SYSTEM_PROMPT = tuple(range(1000, 1050))  # 50-token system prompt
+    SYSTEM_PROMPT_LEN = 50

Then update usages that need a list (e.g., line 1124) to convert explicitly:

prefix = list(self.SYSTEM_PROMPT)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pymllm/tests/test_radix_cache.py` around lines 929 - 930, Change the mutable
class attribute SYSTEM_PROMPT to an immutable tuple (and keep SYSTEM_PROMPT_LEN)
to prevent accidental mutation and satisfy RUF012; then update any test sites
that expect a list (e.g., where prefix is built from SYSTEM_PROMPT) to
explicitly convert it using list(self.SYSTEM_PROMPT) or
list(TestClass.SYSTEM_PROMPT) so behavior remains identical.

1250-1254: Unused loop variable should use _.

The loop index i is not used; convention is to use _ for discarded values.

Suggested fix
-        for i in range(20):
+        for _ in range(20):
             r = cache.match_prefix(_key(seq))
             assert r.prefix_len == len(seq)
             bid = cache.inc_lock_ref(r.last_node)
             locks.append((r.last_node, bid))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pymllm/tests/test_radix_cache.py` around lines 1250 - 1254, The for-loop uses
an unused loop variable `i`; change it to `_` to follow convention and signal
the value is intentionally ignored—update the loop header `for i in range(20):`
to `for _ in range(20):` in the block that calls
`cache.match_prefix(_key(seq))`, `cache.inc_lock_ref(r.last_node)`, and appends
to `locks`.

1181-1183: Unused loop variables should use _ prefix.

node and bid are not used in this loop body (only seq and i for the error message).

Suggested fix
-        for i, (seq, node, bid) in active.items():
+        for i, (seq, _node, _bid) in active.items():
             r = cache.match_prefix(_key(seq))
             assert r.prefix_len == len(seq), f"Request {i} diverging at {20+i*10} lost"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pymllm/tests/test_radix_cache.py` around lines 1181 - 1183, In the loop that
unpacks active.items() in pymllm/tests/test_radix_cache.py, replace the unused
variables node and bid with names that use an underscore prefix (e.g., _node and
_bid) so they indicate they are intentionally unused; update the line "for i,
(seq, node, bid) in active.items():" to "for i, (seq, _node, _bid) in
active.items():" while leaving the body (cache.match_prefix(_key(seq)) and the
assert) unchanged.

763-786: Test documents known limitation appropriately.

The rewritten test clearly documents the corruption scenario: an unmatched dec_lock_ref drives lock_ref negative, causing subsequent inc_lock_ref to fail the 0→1 protection transfer. The comments acknowledge this as a known limitation pending hardening.

Consider adding a # TODO: comment or linking to an issue for tracking the eventual fix in dec_lock_ref to guard against underflow. As per coding guidelines, "TODO and FIXME comments must be written as 'TODO:' or 'FIXME:' followed by UTF-8 text."

Suggested TODO addition
         This documents a known limitation — dec_lock_ref should guard
-        against underflow.  Hardening is a separate task."""
+        against underflow.  Hardening is a separate task.
+        # TODO: Harden dec_lock_ref to guard against underflow (issue `#XXX`)."""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pymllm/tests/test_radix_cache.py` around lines 763 - 786, Add a TODO comment
to the test test_dec_without_inc_corrupts_lock_semantics noting that
dec_lock_ref should guard against underflow and link to the issue/PR tracking
the hardening; place the TODO near the unmatched dec_lock_ref call (reference
dec_lock_ref, inc_lock_ref, lock_ref, evictable_size in the comment) and format
it exactly as "TODO: <UTF-8 text>" so it follows the project's TODO/FIXME
convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pymllm/tests/test_radix_cache.py`:
- Around line 929-930: Change the mutable class attribute SYSTEM_PROMPT to an
immutable tuple (and keep SYSTEM_PROMPT_LEN) to prevent accidental mutation and
satisfy RUF012; then update any test sites that expect a list (e.g., where
prefix is built from SYSTEM_PROMPT) to explicitly convert it using
list(self.SYSTEM_PROMPT) or list(TestClass.SYSTEM_PROMPT) so behavior remains
identical.
- Around line 1250-1254: The for-loop uses an unused loop variable `i`; change
it to `_` to follow convention and signal the value is intentionally
ignored—update the loop header `for i in range(20):` to `for _ in range(20):` in
the block that calls `cache.match_prefix(_key(seq))`,
`cache.inc_lock_ref(r.last_node)`, and appends to `locks`.
- Around line 1181-1183: In the loop that unpacks active.items() in
pymllm/tests/test_radix_cache.py, replace the unused variables node and bid with
names that use an underscore prefix (e.g., _node and _bid) so they indicate they
are intentionally unused; update the line "for i, (seq, node, bid) in
active.items():" to "for i, (seq, _node, _bid) in active.items():" while leaving
the body (cache.match_prefix(_key(seq)) and the assert) unchanged.
- Around line 763-786: Add a TODO comment to the test
test_dec_without_inc_corrupts_lock_semantics noting that dec_lock_ref should
guard against underflow and link to the issue/PR tracking the hardening; place
the TODO near the unmatched dec_lock_ref call (reference dec_lock_ref,
inc_lock_ref, lock_ref, evictable_size in the comment) and format it exactly as
"TODO: <UTF-8 text>" so it follows the project's TODO/FIXME convention.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1c86e78d-0a28-4e78-813b-7fe69e64448c

📥 Commits

Reviewing files that changed from the base of the PR and between 0ac7ee0 and 18190e9.

📒 Files selected for processing (2)
  • pymllm/mem_cache/mamba_radix_cache.py
  • pymllm/tests/test_radix_cache.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant