ww21_shm#301
Conversation
Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
The config layer already forces use_lazy=False when cudart is unavailable. The allocator factory can simply branch on shm_name first, then on use_lazy — no invalid combination can reach here Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
…ifferent metadata (LMCache#3147) Signed-off-by: cr7258 <chengzw258@163.com>
docs(mp): note non-CUDA auto-disable of --l1-use-lazy PR LMCache#3259 (feat(mp): Non-GPU Context by pickle) added a post-init guard in lmcache/v1/distributed/config.py:L1MemoryManagerConfig that auto-disables use_lazy on backends where lmcache.torch_dev has no cudart attribute, logging a warning ("LazyMemoryAllocator requires cudart which is not available on the current backend. Disabling l1-use-lazy."). The user-facing reference in docs/source/mp/configuration.rst still listed the --l1-use-lazy default as plain "True" without mentioning that non-CUDA backends silently downgrade to eager allocation. Extend the description so the documented behavior matches the actual code path. Doc-only change; no code is modified. Signed-off-by: ApostaC <yihua98@uchicago.edu>
… page-aligned for O_DIRECT (LMCache#3191) * [Fix] non_cuda_equivalents: page-align pinned allocations for O_DIRECT Signed-off-by: zhengfei.he <hezhengfei1999@gmail.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
…che#3385) * [Fix]: Skip unpin for non-pinned objects in cleanup_memory_objs Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com> * Add type hint in cache engine clean up test Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com> --------- Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com>
smoke test for image release Signed-off-by: deng451e <838677410@qq.com>
…le package. (LMCache#3115) This shim layer matches the cufile-python API that is currently in use by LMCache. This allows hipFile to maintain its own API while not requiring additional code changes in LMCache. Signed-off-by: Riley Dixon <riley.dixon@amd.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
* [Bugfix] Fix layerwise lookup miss in async_lookup_and_prefetch Signed-off-by: Kihwan Kim <luceinaltis2020@gmail.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
… content (LMCache#3258) Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>
|
@copilot resolve the merge conflicts in this pull request |
# Conflicts: # lmcache/v1/multiprocess/server.py # tests/v1/multiprocess/test_non_cuda_data_transfer.py
Resolved the merge conflicts. The |
| stub_native_storage_ops: Any, | ||
| ) -> None: | ||
| """Regression: repeated prompt after worker restart should no-op-store cleanly. | ||
|
|
There was a problem hiding this comment.
The SHM test cases were incorrectly dropped during the merge conflict resolution — I took dev's version of the test file which didn't have them. I've restored all 13 SHM-specific test functions and updated the two shared tests (test_server_register_and_find_non_cuda_context_layout and test_server_store_and_retrieve_cpu_chunks) to use dev's refactored module paths (engine_context / NonGPUTransferModule). Also added the missing non_gpu_context.py and non_gpu_context_pickle.py files from dev that the modules depend on. Fixed in commit 77e09b7.
* test: add raw-block temp-file coverage Add CI-safe raw-block tests that exercise the Rust device binding, RawBlockCore, and the MP raw-block L2 adapter on truncated regular files. Signed-off-by: DongDongJu <commisori28@gmail.com> Signed-off-by: Dongjoo Seo <dongjoo.seo1@samsung.com> * ci: add focused raw-block temp-file job Add raw-block path filtering and a dedicated Ubuntu raw-block CI job that builds the Rust extension and runs only the temp-file raw-block tests under the repo workspace. Signed-off-by: DongDongJu <commisori28@gmail.com> Signed-off-by: Dongjoo Seo <dongjoo.seo1@samsung.com> * test: address raw-block review feedback Signed-off-by: Dongjoo Seo <dongjoo.seo1@samsung.com> Signed-off-by: DongDongJu <commisori28@gmail.com> --------- Signed-off-by: Dongjoo Seo <dongjoo.seo1@samsung.com> Signed-off-by: DongDongJu <commisori28@gmail.com>
…MCache#3405) Signed-off-by: ApostaC <yihua98@uchicago.edu>
docs(design): align L2 adapter docs with L2StoreResult interface PR LMCache#3363 ([MP] Update the L2 adapter interface to force measuring the real transferred bytes) changed `L2AdapterInterface.pop_completed_store_tasks()` to return `dict[L2TaskId, L2StoreResult]` (was `dict[L2TaskId, bool]`) and removed the optional `pop_completed_store_task_bytes()` method, but three design docs still describe the old `bool` contract: - `docs/design/v1/distributed/l2_adapters/overall.md`: refresh the store-operations contract and the StoreController data-flow comment to reference `L2StoreResult` (success flag + `bytes_transferred()`), and note the fast-path duplicate-key reporting expectation that motivated the interface change. - `docs/design/v1/distributed/l2_adapters/serde_wrapper.md`: update the ASCII flow diagram, the mermaid sequence diagram, and the failure-policy section so they show `L2StoreResult(...)` return values instead of bare `True`/`False`. - `docs/design/v1/mp_observability/METRICS.md`: update the `lmcache_mp.l2_store_throughput` calculation column from `total_bytes` to `bytes_transferred`, matching the new `L2ThroughputSubscriber._on_store_completed` path that reads the real-bytes field from `L2_STORE_COMPLETED` (and drops zero-byte samples). The load-throughput row picks up a short clarifying note that the load path still uses submitted `total_bytes`. Doc-only — no code changes. Auto-generated by the daily LMCache docs drift-check routine for 2026-05-26. Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
…kv cache calculator (LMCache#3408) Signed-off-by: pengxin99 <yc316ypx@126.com>
…he#3395) Signed-off-by: hyukjlee <hyukjlee@amd.com>
test: fix raw-block L2 store result assertions Signed-off-by: Dongjoo Seo <commisori28@gmail.com> Signed-off-by: DongDongJu <commisori28@gmail.com>
…e#3416) * fix: avoid vllm import for blake3 token hasher Signed-off-by: DongDongJu <commisori28@gmail.com> Signed-off-by: Dongjoo Seo <dongjoo.seo1@samsung.com>
…3370) Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: abinggo <107740309+abinggo@users.noreply.github.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ApostaC <25103655+ApostaC@users.noreply.github.com>
[refactor] refactor lmcache bench kvcache cli Signed-off-by: idellzheng <idellzheng@tencent.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
…Cache#3402) Signed-off-by: aeon-x <talexcao@gmail.com>
…each register Signed-off-by: Tony Lin <tony.lin@intel.com>
both modes supported in the same time Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
The reason is because server is not aware of the device type of the connecting workers. By defaulting to auto, the server loads both GPU and non-GPU transfer modules, allowing workers of either type to connect without requiring manual configuration. Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
Signed-off-by: Tony Lin <tony.lin@intel.com>
What this PR does / why we need it:
Special notes for your reviewers:
If applicable: