fix(ssd-cache): inline LRU unlinks so eviction frees queue capacity by cfbraun · Pull Request #1451 · jundot/omlx

cfbraun · 2026-05-27T07:32:28Z

Summary

_enforce_size_limit_for_new_block enqueues evicted file unlinks as ("unlink", path) items onto _write_queue — the same bounded queue that carries pending writes. Combined with the pre-eviction _write_queue.full() short-circuit at the top of save_block, this creates a deadlock under sustained save pressure:

Writer is saturated → _write_queue is full.
save_block's pre-eviction check sees full() → returns False immediately, before calling _enforce_size_limit_for_new_block.
Eviction never runs → cache stays at the size cap.
Every subsequent save drops; ssd_write_drops climbs forever while _total_size sits pinned at the cap.

Fix: inline the unlinks on the eviction-calling thread instead. Eviction typically removes a single block per save (evict_until_size stops as soon as total_size <= target), so this is one syscall per save in steady state. The deferred-unlink justification ("avoid blocking the inference thread with N file delete syscalls") doesn't materialise under normal load, and inlining removes the bounded-queue contention entirely.

Bounded inline burst. The ENOSPC-recovery path invalidates the 30 s disk-usage cache, which can shrink _get_effective_max_size sharply — evict_until_size would then return hundreds of entries at once and the inline-unlink loop would stall the inference thread on a syscall storm. Cap the burst at _MAX_INLINE_UNLINKS_PER_SAVE = 32 and reinsert the deferred metadata into the index so subsequent saves drain the remainder.

Also adds an evict_unlink_failures stats counter (eviction now decrements the index before the on-disk unlink; if Path.unlink raises OSError, surfacing the counter lets operators see when on-disk size has drifted above what the index reports).

The dead writer-thread ("unlink", file_path) dispatch branch is removed since no path enqueues such tuples anymore.

Test plan

pytest tests/test_paged_ssd_cache.py::TestInlineLRUUnlinks — 4 passed
pytest tests/test_paged_ssd_cache.py tests/test_hot_cache.py — 125 passed (4 new + 121 existing)

``_enforce_size_limit_for_new_block`` enqueues evicted file unlinks as ``("unlink", path)`` items onto ``_write_queue`` — the same bounded queue that carries pending writes. Combined with the pre-eviction ``_write_queue.full()`` short-circuit at the top of ``save_block``, this creates a deadlock under sustained save pressure: 1. Writer is saturated → ``_write_queue`` is full. 2. ``save_block``'s pre-eviction check sees ``full()`` → returns False immediately, BEFORE calling ``_enforce_size_limit_for_new_block``. 3. Eviction never runs → cache stays at the size cap. 4. Every subsequent save drops; ``ssd_write_drops`` climbs forever while ``_total_size`` sits pinned at the cap. Inline the unlinks on the eviction-calling thread instead. Eviction typically removes a single block per save (``evict_until_size`` stops as soon as ``total_size <= target``), so this is one syscall per save in steady state. The deferred-unlink justification ("avoid blocking the inference thread with N file delete syscalls") doesn't materialise under normal load, and inlining removes the bounded-queue contention entirely. Bounded inline burst. The ENOSPC-recovery path invalidates the 30 s disk-usage cache, which can shrink ``_get_effective_max_size`` sharply on the next save — ``evict_until_size`` would then return hundreds of entries at once and the inline-unlink loop would stall the inference thread on a syscall storm. Cap the burst at ``_MAX_INLINE_UNLINKS_PER_SAVE = 32`` and reinsert the deferred metadata into the index so subsequent saves drain the remainder. Bounds per-call latency at the cost of taking multiple saves to fully reconverge. ``evict_unlink_failures`` stats counter. Eviction now decrements the index before the on-disk unlink; if ``Path.unlink`` raises ``OSError``, the previous "log a warning and move on" pattern silently lost the signal. Surfacing the counter lets operators see that the on-disk size has drifted above what the index reports. Tests (tests/test_paged_ssd_cache.py::TestInlineLRUUnlinks): - test_eviction_does_not_enqueue_unlink_tasks: sentinel-patch on ``put_nowait`` asserts no ``("unlink", ...)`` items ever enter the queue. - test_eviction_frees_capacity_under_pressure: with the writer busy, eviction still keeps ``_index.total_size`` near the configured cap. - test_inline_eviction_burst_is_capped: forced mass-eviction removes at most ``_MAX_INLINE_UNLINKS_PER_SAVE`` entries; the rest reinsert so subsequent saves can drain. - test_unlink_failure_increments_counter: a patched ``OSError`` from ``Path.unlink`` increments ``evict_unlink_failures``. The dead writer-thread ``("unlink", file_path)`` dispatch branch is removed since no path enqueues such tuples anymore. 87 existing paged_ssd_cache tests + 34 hot_cache tests + 4 new tests pass.

cfbraun force-pushed the pr/inline-lru-unlinks branch 2 times, most recently from a9792b3 to 6577237 Compare May 27, 2026 09:30

jundot mentioned this pull request May 28, 2026

fix(cache): boundary snapshots dropped during prefill because uid mapping not yet populated #1471

Closed

5 tasks

cfbraun force-pushed the pr/inline-lru-unlinks branch 8 times, most recently from 0c7cbd8 to 73f0184 Compare June 2, 2026 06:08

cfbraun force-pushed the pr/inline-lru-unlinks branch from 73f0184 to 84fedc4 Compare June 2, 2026 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ssd-cache): inline LRU unlinks so eviction frees queue capacity#1451

fix(ssd-cache): inline LRU unlinks so eviction frees queue capacity#1451
cfbraun wants to merge 1 commit into
jundot:mainfrom
cfbraun:pr/inline-lru-unlinks

cfbraun commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cfbraun commented May 27, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant