ww21_shm by hlin99 · Pull Request #301 · hlin99/LMCache

hlin99 · 2026-05-27T02:09:46Z

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

Signed-off-by: Tony Lin <tony.lin@intel.com>

The config layer already forces use_lazy=False when cudart is unavailable. The allocator factory can simply branch on shm_name first, then on use_lazy — no invalid combination can reach here Signed-off-by: Tony Lin <tony.lin@intel.com>

Signed-off-by: Tony Lin <tony.lin@intel.com>

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>

…ifferent metadata (LMCache#3147) Signed-off-by: cr7258 <chengzw258@163.com>

docs(mp): note non-CUDA auto-disable of --l1-use-lazy PR LMCache#3259 (feat(mp): Non-GPU Context by pickle) added a post-init guard in lmcache/v1/distributed/config.py:L1MemoryManagerConfig that auto-disables use_lazy on backends where lmcache.torch_dev has no cudart attribute, logging a warning ("LazyMemoryAllocator requires cudart which is not available on the current backend. Disabling l1-use-lazy."). The user-facing reference in docs/source/mp/configuration.rst still listed the --l1-use-lazy default as plain "True" without mentioning that non-CUDA backends silently downgrade to eager allocation. Extend the description so the documented behavior matches the actual code path. Doc-only change; no code is modified. Signed-off-by: ApostaC <yihua98@uchicago.edu>

… page-aligned for O_DIRECT (LMCache#3191) * [Fix] non_cuda_equivalents: page-align pinned allocations for O_DIRECT Signed-off-by: zhengfei.he <hezhengfei1999@gmail.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>

…che#3385) * [Fix]: Skip unpin for non-pinned objects in cleanup_memory_objs Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com> * Add type hint in cache engine clean up test Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com> --------- Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com>

smoke test for image release Signed-off-by: deng451e <838677410@qq.com>

…le package. (LMCache#3115) This shim layer matches the cufile-python API that is currently in use by LMCache. This allows hipFile to maintain its own API while not requiring additional code changes in LMCache. Signed-off-by: Riley Dixon <riley.dixon@amd.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>

* [Bugfix] Fix layerwise lookup miss in async_lookup_and_prefetch Signed-off-by: Kihwan Kim <luceinaltis2020@gmail.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>

… content (LMCache#3258) Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>

hlin99 · 2026-05-27T02:30:00Z

@copilot resolve the merge conflicts in this pull request

# Conflicts: # lmcache/v1/multiprocess/server.py # tests/v1/multiprocess/test_non_cuda_data_transfer.py

Copilot · 2026-05-27T02:33:59Z

@copilot resolve the merge conflicts in this pull request

Resolved the merge conflicts. The dev branch refactored server.py into a modular architecture (extracting logic into engine_context.py, engine_module.py, and the modules/ package), while this branch had the old monolithic version with SHM additions. I accepted dev's refactored server.py and test patches since the SHM-specific code lives in separate files (server_transfer.py, worker_transfer/) that don't conflict. Merge commit: aac9aa1.

hlin99 · 2026-05-27T02:35:45Z

-    stub_native_storage_ops: Any,
-) -> None:
-    """Regression: repeated prompt after worker restart should no-op-store cleanly.
-


why remove test cases?

@copilot why remove test cases?

The SHM test cases were incorrectly dropped during the merge conflict resolution — I took dev's version of the test file which didn't have them. I've restored all 13 SHM-specific test functions and updated the two shared tests (test_server_register_and_find_non_cuda_context_layout and test_server_store_and_retrieve_cpu_chunks) to use dev's refactored module paths (engine_context / NonGPUTransferModule). Also added the missing non_gpu_context.py and non_gpu_context_pickle.py files from dev that the modules depend on. Fixed in commit 77e09b7.

* test: add raw-block temp-file coverage Add CI-safe raw-block tests that exercise the Rust device binding, RawBlockCore, and the MP raw-block L2 adapter on truncated regular files. Signed-off-by: DongDongJu <commisori28@gmail.com> Signed-off-by: Dongjoo Seo <dongjoo.seo1@samsung.com> * ci: add focused raw-block temp-file job Add raw-block path filtering and a dedicated Ubuntu raw-block CI job that builds the Rust extension and runs only the temp-file raw-block tests under the repo workspace. Signed-off-by: DongDongJu <commisori28@gmail.com> Signed-off-by: Dongjoo Seo <dongjoo.seo1@samsung.com> * test: address raw-block review feedback Signed-off-by: Dongjoo Seo <dongjoo.seo1@samsung.com> Signed-off-by: DongDongJu <commisori28@gmail.com> --------- Signed-off-by: Dongjoo Seo <dongjoo.seo1@samsung.com> Signed-off-by: DongDongJu <commisori28@gmail.com>

…MCache#3405) Signed-off-by: ApostaC <yihua98@uchicago.edu>

docs(design): align L2 adapter docs with L2StoreResult interface PR LMCache#3363 ([MP] Update the L2 adapter interface to force measuring the real transferred bytes) changed `L2AdapterInterface.pop_completed_store_tasks()` to return `dict[L2TaskId, L2StoreResult]` (was `dict[L2TaskId, bool]`) and removed the optional `pop_completed_store_task_bytes()` method, but three design docs still describe the old `bool` contract: - `docs/design/v1/distributed/l2_adapters/overall.md`: refresh the store-operations contract and the StoreController data-flow comment to reference `L2StoreResult` (success flag + `bytes_transferred()`), and note the fast-path duplicate-key reporting expectation that motivated the interface change. - `docs/design/v1/distributed/l2_adapters/serde_wrapper.md`: update the ASCII flow diagram, the mermaid sequence diagram, and the failure-policy section so they show `L2StoreResult(...)` return values instead of bare `True`/`False`. - `docs/design/v1/mp_observability/METRICS.md`: update the `lmcache_mp.l2_store_throughput` calculation column from `total_bytes` to `bytes_transferred`, matching the new `L2ThroughputSubscriber._on_store_completed` path that reads the real-bytes field from `L2_STORE_COMPLETED` (and drops zero-byte samples). The load-throughput row picks up a short clarifying note that the load path still uses submitted `total_bytes`. Doc-only — no code changes. Auto-generated by the daily LMCache docs drift-check routine for 2026-05-26. Signed-off-by: ApostaC <yihua98@uchicago.edu>

Signed-off-by: ApostaC <yihua98@uchicago.edu>

…kv cache calculator (LMCache#3408) Signed-off-by: pengxin99 <yc316ypx@126.com>

…he#3395) Signed-off-by: hyukjlee <hyukjlee@amd.com>

test: fix raw-block L2 store result assertions Signed-off-by: Dongjoo Seo <commisori28@gmail.com> Signed-off-by: DongDongJu <commisori28@gmail.com>

…e#3416) * fix: avoid vllm import for blake3 token hasher Signed-off-by: DongDongJu <commisori28@gmail.com> Signed-off-by: Dongjoo Seo <dongjoo.seo1@samsung.com>

…3370) Signed-off-by: deng451e <838677410@qq.com>

Signed-off-by: abinggo <107740309+abinggo@users.noreply.github.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ApostaC <25103655+ApostaC@users.noreply.github.com>

[refactor] refactor lmcache bench kvcache cli Signed-off-by: idellzheng <idellzheng@tencent.com>

Signed-off-by: Tony Lin <tony.lin@intel.com>

…Cache#3402) Signed-off-by: aeon-x <talexcao@gmail.com>

…each register Signed-off-by: Tony Lin <tony.lin@intel.com>

both modes supported in the same time Signed-off-by: Tony Lin <tony.lin@intel.com>

Signed-off-by: Tony Lin <tony.lin@intel.com>

The reason is because server is not aware of the device type of the connecting workers. By defaulting to auto, the server loads both GPU and non-GPU transfer modules, allowing workers of either type to connect without requiring manual configuration. Signed-off-by: Tony Lin <tony.lin@intel.com>

Signed-off-by: Tony Lin <tony.lin@intel.com>

…Cache#3425)

Signed-off-by: Tony Lin <tony.lin@intel.com>

hlin99 and others added 23 commits May 21, 2026 08:09

feat(mp): SHM-based data transfer path for GPGPUs/CPU

2eef210

Signed-off-by: Tony Lin <tony.lin@intel.com>

address gemini's comments

3f8a799

Signed-off-by: Tony Lin <tony.lin@intel.com>

Use multiprocessing.shared_memory for cross-platform SHM transport

2c71e8a

Signed-off-by: Tony Lin <tony.lin@intel.com>

Merge branch 'dev' into ww21_PR_shm

2f74d25

Merge branch 'dev' into ww21_PR_shm

eeeb301

docs: MPCacheEngine prepare/commit docstrings

7e87a7a

Signed-off-by: Tony Lin <tony.lin@intel.com>

Move SHM logic to MPCacheEngine and add lazy/SHM guard

09e81e7

Signed-off-by: Tony Lin <tony.lin@intel.com>

add ShmSlotDescriptor schema

7809a3e

Signed-off-by: Tony Lin <tony.lin@intel.com>

Refactor: move transport files into lmcache/v1/multiprocess/transport/

6268f2a

Signed-off-by: Tony Lin <tony.lin@intel.com>

Remove redundant shm_name vs use_lazy check

d626ad0

The config layer already forces use_lazy=False when cudart is unavailable. The allocator factory can simply branch on shm_name first, then on use_lazy — no invalid combination can reach here Signed-off-by: Tony Lin <tony.lin@intel.com>

Merge branch 'dev' into ww21_PR_shm

350876e

Signed-off-by: Tony Lin <tony.lin@intel.com>

abstract server transfer strategy

8492a75

Signed-off-by: Tony Lin <tony.lin@intel.com>

to a more friendly naming

c61c215

Signed-off-by: Tony Lin <tony.lin@intel.com>

fix: support HND formats in MP KV transfer (LMCache#3282)

3dd3668

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>

[Fix][Observability] PrometheusLogger instance already created with d…

1966340

…ifferent metadata (LMCache#3147) Signed-off-by: cr7258 <chengzw258@163.com>

[Operator]: Force external LMCache MP connector path (LMCache#3393)

956b7bd

[CI] smoke-test container images before pushing (LMCache#3358)

370cf94

smoke test for image release Signed-off-by: deng451e <838677410@qq.com>

[Bugfix] Fix 0-hit async lookup when use_layerwise=true (LMCache#3252)

1b8785a

* [Bugfix] Fix layerwise lookup miss in async_lookup_and_prefetch Signed-off-by: Kihwan Kim <luceinaltis2020@gmail.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>

fix: prevent TypeError crash when streaming response has zero visible…

bd713b5

… content (LMCache#3258) Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>

Copilot AI mentioned this pull request May 27, 2026

Merge dev into ww21_PR_shm with conflict-safe SHM preservation #302

Closed

Copilot started work on behalf of hlin99 May 27, 2026 02:30 View session

Merge remote-tracking branch 'origin/dev' into ww21_PR_shm

aac9aa1

# Conflicts: # lmcache/v1/multiprocess/server.py # tests/v1/multiprocess/test_non_cuda_data_transfer.py

Copilot finished work on behalf of hlin99 May 27, 2026 02:34

hlin99 commented May 27, 2026

View reviewed changes

DongDongJu and others added 29 commits May 27, 2026 15:34

[CI/CD] Tighten the threshold requirement for k3 multiprocess test (L…

61aa202

…MCache#3405) Signed-off-by: ApostaC <yihua98@uchicago.edu>

[Docs] Combined PR from recent doc drift scannings (LMCache#3401)

f41931c

Signed-off-by: ApostaC <yihua98@uchicago.edu>

Add Qwen3-30B-A3B-Instruct-2507 and Qwen3-235B-A22B-Thinking-2507 to …

b3a7275

…kv cache calculator (LMCache#3408) Signed-off-by: pengxin99 <yc316ypx@126.com>

[Doc][ROCm] Document gfx950 (MI350X/MI355X) in install example (LMCac…

262759a

…he#3395) Signed-off-by: hyukjlee <hyukjlee@amd.com>

[Hotfix][CI] fix raw-block L2 store result assertions (LMCache#3415)

acf3c7f

test: fix raw-block L2 store result assertions Signed-off-by: Dongjoo Seo <commisori28@gmail.com> Signed-off-by: DongDongJu <commisori28@gmail.com>

[Bugfix] Avoid vLLM import during blake3 token hasher startup (LMCach…

713b1a5

…e#3416) * fix: avoid vllm import for blake3 token hasher Signed-off-by: DongDongJu <commisori28@gmail.com> Signed-off-by: Dongjoo Seo <dongjoo.seo1@samsung.com>

[FIX] use nixl meta-package on CUDA 13 so L2 adapters load (LMCache#…

39e3beb

…3370) Signed-off-by: deng451e <838677410@qq.com>

fix: move pytest.ini to project root so CI picks it up (LMCache#3250)

67a72cc

Signed-off-by: abinggo <107740309+abinggo@users.noreply.github.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>

[Build] sync torch version with vLLM (2.11.0) (LMCache#3348)

29bbd55

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ApostaC <25103655+ApostaC@users.noreply.github.com>

[refactor]Refactor bench kvcache cli (LMCache#3411)

d6b1a08

[refactor] refactor lmcache bench kvcache cli Signed-off-by: idellzheng <idellzheng@tencent.com>

restore shm logic during rebase

af4910e

Signed-off-by: Tony Lin <tony.lin@intel.com>

move files to worker_transfer for clear code layout

8dae6dd

Signed-off-by: Tony Lin <tony.lin@intel.com>

[LMCache MP Connector] Report cache hit stats in KVTransferParams (LM…

3e0e3ff

…Cache#3402) Signed-off-by: aeon-x <talexcao@gmail.com>

refactor: cache _shm_pool_info in __init__ instead of recomputing on …

0423cfe

…each register Signed-off-by: Tony Lin <tony.lin@intel.com>

Split UNREGISTER_KV_CACHE into GPU and non-GPU variants to have

3c58046

both modes supported in the same time Signed-off-by: Tony Lin <tony.lin@intel.com>

chore: handle bool/int strict decoding in msgspec_decode

e479f6f

Signed-off-by: Tony Lin <tony.lin@intel.com>

Merge branch 'dev' into ww21_PR_shm

d4efd2b

move transfer_context.py into worker_transfer/ package

eda3847

Signed-off-by: Tony Lin <tony.lin@intel.com>

move server transfer into modules

b11e368

Signed-off-by: Tony Lin <tony.lin@intel.com>

refactor: move SHM pool info to MPCacheEngineContext

ddb3e92

Signed-off-by: Tony Lin <tony.lin@intel.com>

refactor test cases

0d74e42

Signed-off-by: Tony Lin <tony.lin@intel.com>

update docs according to latest code

4f6032d

Signed-off-by: Tony Lin <tony.lin@intel.com>

fix: correct spelling 'mignt' -> 'might' in cache policy comments (LM…

49823dc

…Cache#3425)

Refactored name for better semantic clarity.

a783898

Signed-off-by: Tony Lin <tony.lin@intel.com>

Merge branch 'dev' into ww21_PR_shm

4ef643b

merging files

07b49b2

Signed-off-by: Tony Lin <tony.lin@intel.com>

hlin99 closed this May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ww21_shm#301

ww21_shm#301
hlin99 wants to merge 56 commits into
devfrom
ww21_PR_shm

hlin99 commented May 27, 2026

Uh oh!

hlin99 commented May 27, 2026

Uh oh!

Copilot AI commented May 27, 2026

Uh oh!

hlin99 May 27, 2026

Uh oh!

hlin99 May 27, 2026

Uh oh!

Copilot AI May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Conversation

hlin99 commented May 27, 2026

Uh oh!

hlin99 commented May 27, 2026

Uh oh!

Copilot AI commented May 27, 2026

Uh oh!

hlin99 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

hlin99 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants