Add SHM-based non-GPU KV transport without changing existing NonGpuContext interfaces#278
Conversation
Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/07c7d0ab-d21a-4245-9109-006f91352b6c Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/07c7d0ab-d21a-4245-9109-006f91352b6c Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
hlin99
left a comment
There was a problem hiding this comment.
- can you refer to https://github.com/hlin99/LMCache/pull/273/commits the last commit to use mmap?
- worker/server side print out logger.info pickle path or shm path.
Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/6bb8fb82-c368-43a5-a4b1-83bfedecc1a6 Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/6bb8fb82-c368-43a5-a4b1-83bfedecc1a6 Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Done in |
hlin99
left a comment
There was a problem hiding this comment.
server启动一次不动,worker前后启动两次,发两个一模一样的prompt,第一次都没问题,但是第二次worker启动以后报错如下。
(EngineCore pid=28702) [2026-05-20 06:32:33,134] LMCache INFO: Currently used by:
(EngineCore pid=28702) - vLLM non-MLA flash attention (utils.py:413:lmcache.v1.gpu_connector.utils)
(EngineCore pid=28702) [2026-05-20 06:32:33,135] LMCache INFO: Creating NonGpuContextShm (shm_name=lmcache_l1_pool_26917, pool_size=21474836480) (non_gpu_context.py:128:lmcache.v1.multiprocess.non_gpu_context)
(EngineCore pid=28702) INFO 05-20 06:32:33 [gpu_worker.py:578] Compile and warming up model for size 2048
(EngineCore pid=28702) WARNING 05-20 06:32:33 [gpu_model_runner.py:5965] Skipping CUDA graph capture. To turn on CUDA graph capture, ensure cudagraph_mode was not manually set to NONE
(EngineCore pid=28702) INFO 05-20 06:32:33 [core.py:283] init engine (profile, create kv cache, warmup model) took 5.00 seconds
(EngineCore pid=28702) INFO 05-20 06:32:34 [factory.py:64] Creating v1 connector with name: LMCacheMPConnector and engine_id: 37ebcb6d-db18-410f-86ab-df39ef06e456
(EngineCore pid=28702) WARNING 05-20 06:32:34 [base.py:189] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
(EngineCore pid=28702) INFO 05-20 06:32:34 [vllm.py:790] Asynchronous scheduling is enabled.
(EngineCore pid=28702) WARNING 05-20 06:32:34 [xpu.py:181] XPU Graph is not supported in the current PyTorch version, disabling cudagraph_mode.
(APIServer pid=28433) INFO 05-20 06:32:34 [api_server.py:590] Supported tasks: ['generate']
(APIServer pid=28433) WARNING 05-20 06:32:34 [model.py:1435] Default vLLM sampling parameters have been overridden by the model's generation_config.json: {'temperature': 0.6, 'top_p': 0.9}. If this is not intended, please relaunch vLLM instance with --generation-config vllm.
(APIServer pid=28433) INFO 05-20 06:32:35 [hf.py:314] Detected the chat template content format to be 'string'. You can set --chat-template-content-format to override this.
(APIServer pid=28433) INFO 05-20 06:32:35 [api_server.py:594] Starting vLLM server on http://0.0.0.0:8000
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:37] Available routes are:
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /docs, Methods: GET, HEAD
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /redoc, Methods: GET, HEAD
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /v1/chat/completions/batch, Methods: POST
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /v1/messages/count_tokens, Methods: POST
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /v1/chat/completions/render, Methods: POST
(APIServer pid=28433) INFO 05-20 06:32:35 [launcher.py:46] Route: /v1/completions/render, Methods: POST
(APIServer pid=28433) INFO: Started server process [28433]
(APIServer pid=28433) INFO: Waiting for application startup.
(APIServer pid=28433) INFO: Application startup complete.
(EngineCore pid=28702) [2026-05-20 06:32:49,047] LMCache INFO: PeriodicThread lmcache-heartbeat entering main loop (interval=10.0s) (periodic_thread.py:279:lmcache.v1.periodic_thread)
(EngineCore pid=28702) [2026-05-20 06:32:49,047] LMCache INFO: Started PeriodicThread: lmcache-heartbeat (level=critical, interval=10.0s, init_wait=0.0s) (periodic_thread.py:239:lmcache.v1.periodic_thread)
(EngineCore pid=28702) [2026-05-20 06:32:49,050] LMCache INFO: PeriodicThread lmcache-heartbeat entering main loop (interval=10.0s) (periodic_thread.py:279:lmcache.v1.periodic_thread)
(EngineCore pid=28702) [2026-05-20 06:32:49,050] LMCache INFO: Started PeriodicThread: lmcache-heartbeat (level=critical, interval=10.0s, init_wait=0.0s) (periodic_thread.py:239:lmcache.v1.periodic_thread)
(EngineCore pid=28702) [2026-05-20 06:32:49,052] LMCache INFO: list_depth: 1, tensor_dim: 5 (utils.py:509:lmcache.v1.gpu_connector.utils)
(EngineCore pid=28702) [2026-05-20 06:32:49,052] LMCache INFO: GPU KV Cache Dimensions: [32][2, 372, 64, 8, 128] (utils.py:520:lmcache.v1.gpu_connector.utils)
(EngineCore pid=28702) [2026-05-20 06:32:49,052] LMCache INFO: vLLM KV cache layout: NHD (utils.py:534:lmcache.v1.gpu_connector.utils)
(EngineCore pid=28702) [2026-05-20 06:32:49,052] LMCache INFO: GPU KV Format: NL x [2, NB, BS, NH, HS] (utils.py:412:lmcache.v1.gpu_connector.utils)
(EngineCore pid=28702) [2026-05-20 06:32:49,052] LMCache INFO: Currently used by:
(EngineCore pid=28702) - vLLM non-MLA flash attention (utils.py:413:lmcache.v1.gpu_connector.utils)
(EngineCore pid=28702) [2026-05-20 06:32:54,916] LMCache INFO: list_depth: 1, tensor_dim: 5 (utils.py:509:lmcache.v1.gpu_connector.utils)
(EngineCore pid=28702) [2026-05-20 06:32:54,917] LMCache INFO: GPU KV Cache Dimensions: [32][2, 372, 64, 8, 128] (utils.py:520:lmcache.v1.gpu_connector.utils)
(EngineCore pid=28702) [2026-05-20 06:32:54,917] LMCache INFO: vLLM KV cache layout: NHD (utils.py:534:lmcache.v1.gpu_connector.utils)
(EngineCore pid=28702) [2026-05-20 06:32:54,917] LMCache INFO: GPU KV Format: NL x [2, NB, BS, NH, HS] (utils.py:412:lmcache.v1.gpu_connector.utils)
(EngineCore pid=28702) [2026-05-20 06:32:54,917] LMCache INFO: Currently used by:
(EngineCore pid=28702) - vLLM non-MLA flash attention (utils.py:413:lmcache.v1.gpu_connector.utils)
(EngineCore pid=28702) [2026-05-20 06:32:54,926] LMCache ERROR: Something went wrong when processing the store request for request_id=cmpl-a2ea2c8d4c493a4e-0-807a58c8 (vllm_multi_process_adapter.py:1155:lmcache.integration.vllm.vllm_multi_process_adapter)
(APIServer pid=28433) INFO 05-20 06:32:55 [loggers.py:259] Engine 000: Avg prompt throughput: 1.6 tokens/s, Avg generation throughput: 12.4 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 3.5%, Prefix cache hit rate: 0.0%, External prefix cache hit rate: 97.6%
hlin99
left a comment
There was a problem hiding this comment.
任务:将 PR #278 的 SHM transport 移植到 ww20_PR_cpu_context_pickle 分支并修复已知问题
背景
PR #278 (copilot/add-nongpucontext-shm-implementation) 在 non-GPU context 的 pickle 路径之上新增了 SHM(shared memory)传输模式。需要将该 PR 的改动 cherry-pick / 移植到 hlin99:ww20_PR_cpu_context_pickle 分支,同时修复以下 code review 中发现的所有问题。
🔴 Critical Issues(必须修复)
1. 幂等性 Bug:重复 prompt 的 store 请求会失败
复现方式: server 不重启,worker 前后启动两次,发两次一模一样的 prompt。第二次 store 时报错 Something went wrong when processing the store request。
根因: 第二次 store 时 key 已存在,reserve_write(..., "new") 返回空 dict,导致 prepare_store 存入空 reserved_keys 列表到 _pending_shm_writes,随后 commit_store 中 not reserved_keys → 返回 False。
修复:
server.py的prepare_store:如果reserved为空(not reserved_keys),不要存入_pending_shm_writes,直接返回PrepareStoreResponse(context={})(不含"slots"key)。server.py的commit_store:当cpu_data == b""且 SHM active 时,如果not reserved_keys,返回True(幂等语义:key 已存在 = 存储成功),而非False。- Worker 侧
NonGpuContextShm.prepare_store:如果 response 中context不包含"slots"key 或 slots 为空列表,返回None(让上层跳过 SHM 写入)。
2. close() 双重调用导致 OSError: Bad file descriptor
文件: non_gpu_context_shm.py 的 close() 方法
修复: 添加 guard 防止双重关闭:
def close(self) -> None:
if self._shm_fd < 0:
return
try:
self._mmap_obj.close()
finally:
fd = self._shm_fd
self._shm_fd = -1
os.close(fd)
3. _pending_shm_writes / _pending_shm_reads 无锁保护,存在竞态
文件: server.py L262-264 及所有对这两个 dict 的读写操作
修复: 添加一个 threading.Lock(例如 self._pending_shm_lock),在所有访问 _pending_shm_writes 和 _pending_shm_reads 的代码块中加锁:
prepare_store 中写入
commit_store 中 pop
prepare_retrieve 中写入
commit_retrieve 中 pop
unregister_kv_cache 中过滤清理
4. _is_shm_active() 每次调用都查询 storage manager — 热路径性能问题
文件: server.py 的 _is_shm_active() 方法
修复: 在 __init__ 中初始化 self._shm_active: bool = False,在 register_kv_cache_non_gpu_context 成功注册后更新该标志(基于 get_shm_pool_info() 结果)。后续所有 _is_shm_active() 调用直接返回 self._shm_active。
5. prepare_store reserve 后 worker 超时导致内存泄漏
文件: server.py 的 prepare_store + commit_store + unregister_kv_cache
修复: 在 unregister_kv_cache 中,对被清理的 _pending_shm_writes 条目,调用 self.storage_manager.finish_write(v) 释放已 reserve 的内存(而不仅仅是从 dict 中删除):
Python
# 在 unregister_kv_cache 中
stale_writes = {k: v for k, v in self._pending_shm_writes.items() if k[0] == instance_id}
for k, v in stale_writes.items():
if v:
self.storage_manager.finish_write(v)
del self._pending_shm_writes[k]
# 同理处理 _pending_shm_reads(调用 finish_read_prefetched)
stale_reads = {k: v for k, v in self._pending_shm_reads.items() if k[0] == instance_id}
for k, v in stale_reads.items():
if v:
self.storage_manager.finish_read_prefetched(v)
del self._pending_shm_reads[k]
6. prepare_retrieve SHM 路径使用 unsafe_read 缺少前置条件校验
文件: server.py 的 prepare_retrieve SHM 分支
修复: 在 unsafe_read 调用前添加注释说明前置条件(lookup 阶段已获取 read lock),并添加防御性检查:如果 unsafe_read 返回的 shm_memory_objs 数量少于 obj_keys,确保释放已获取的 keys 后返回失败(当前代码已有此逻辑,确认保留)。
🟡 Medium Issues(建议修复)
7. _make_non_gpu_transfer_key 包含 key.token_ids(大 tuple)导致 hash 开销大
修复: 如果 IPCCacheEngineKey 是 hashable 的,改用 (instance_id, key) 作为 dict key。否则考虑用 (instance_id, key.request_id, key.start, key.end) 作为更轻量的唯一标识。
8. 死代码:memory_manager.py 中 _check_shm_capacity 未被调用
修复: 删除 _check_shm_capacity 函数,或在 create_memory_allocator 中调用它替代内联逻辑。
9. 确认 MixedMemoryAllocator 支持 shm_name 参数
修复: 确认 MixedMemoryAllocator.__init__ 接受 shm_name kwarg。如果不支持,需要在本次改动中添加该参数的处理逻辑,否则运行时会 TypeError。
🟢 Minor Issues(可选修复)
10. 测试覆盖不足
建议在 test_shm_l1_pool.py 中增加:
SHM 文件不存在时 NonGpuContextShm.__init__ 应抛异常的测试
create_non_gpu_context(shm_name="", pool_size=0) 回退到 pickle 的测试
重复 key store(幂等性)的测试
close() 被调用两次不崩溃的测试
11. non_gpu_context_shm.py 中用 torch.empty((), dtype=dtype).element_size() 获取 itemsize
风格建议:可改为 torch.tensor([], dtype=dtype).element_size() 或保持现状均可。
实施要求
基于 hlin99:ww20_PR_cpu_context_pickle 分支创建 PR
将 PR #278 (copilot/add-nongpucontext-shm-implementation 分支) 的所有文件改动移植过来
在移植的同时,应用上述所有 🔴 Critical 修复和 🟡 Medium 修复
确保现有测试通过,新增的 SHM 测试也通过
PR 标题建议:Port SHM transport to cpu_context_pickle branch with bug fixesAgent-Logs-Url: https://github.com/hlin99/LMCache/sessions/0b3178a9-b4e2-411e-9e1d-1c8617b05893 Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/0b3178a9-b4e2-411e-9e1d-1c8617b05893 Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/0b3178a9-b4e2-411e-9e1d-1c8617b05893 Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Fixed in |
…ctness gaps from #278 review (#280) * Initial plan * Fix SHM non-GPU transport idempotency, locking, and cleanup issues Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/661cbeee-d0d4-40ef-9312-4044e4696a51 Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com> * Polish SHM feedback fixes and align validation comments Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/661cbeee-d0d4-40ef-9312-4044e4696a51 Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com> * Fix memory leak: early-return from prepare_store when all keys exist When reserve_write returns empty (all object keys already cached), return PrepareStoreResponse(context={}) immediately without storing an entry in _pending_shm_writes. This prevents leaked entries that would never be popped since the worker won't call commit_store. Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/182111d5-1737-49c0-be65-0287d5b9d6c5 Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
This PR adds a shared-memory (SHM) transport path on top of the existing pickle-based non-GPU context flow, while keeping the current
NonGpuContextABC/protocol contracts intact. It introduces SHM slot-based prepare/commit semantics with backward-compatible fallback to pickle mode.Worker-side non-GPU context
NonGpuContextShminlmcache/v1/multiprocess/non_gpu_context_shm.py.os.open("/dev/shm/...") + mmap(nomultiprocessing.shared_memory.SharedMemory).mmapattach, then close fd immediately to avoid lifecycle/resource-tracker issues.prepare_store: requests SHM slots and returns tensor views backed by SHM.commit_store: sendsCOMMIT_STOREwith empty bytes (b"") as SHM commit signal.prepare_retrieve: requests SHM slots for retrieval and returns SHM-backed tensor views.commit_retrieve: notifies server to release read locks._make_tensor_view(...)built ontorch.frombuffer.Context factory + registration handshake
create_non_gpu_context(...)to acceptshm_nameandpool_size; chooses SHM context only when both are valid, otherwise keeps pickle path.RegisterNonGpuContextResponsein protocol definitions and wiredREGISTER_KV_CACHE_NON_GPU_CONTEXTto return it.DataTransferContext.registerto consume registration response and instantiate either SHM or pickle context accordingly.Server SHM protocol behavior
register_kv_cache_non_gpu_contextnow returns SHM pool metadata from storage manager.prepare_store(SHM mode): reserves write objects and returns slot metadata (offset,length,shape,dtype) inPrepareStoreResponse.context["slots"].commit_store(SHM mode): treats empty payload as SHM commit and finalizesfinish_writewithout deserialization.prepare_retrieve(SHM mode): reads already-prefetched objects and returns slot metadata inPrepareRetrieveResponse.context["slots"].commit_retrieve(SHM mode): releases prefetched read locks.Transport-path visibility (logging)
logger.infomessages on both worker and server sides indicating whether non-GPU transfer is using SHM or pickle transport.shm_name,pool_size) when active.SHM pool plumbing in memory/storage managers
L1MemoryManagerConfignow includesshm_name(defaultlmcache_l1_pool_<pid>).create_memory_allocatornow attempts SHM pool setup for non-lazy L1 allocator with graceful fallback to regular pickle-compatible allocator on failure._check_shm_capacity(required_bytes)_unlink_stale_shm(shm_name)(with name/path safety checks)get_shm_pool_info()delegation chain:L1MemoryManager→L1Manager→StorageManager.StorageManager.unsafe_read(keys)for SHM retrieve path (no additional read-lock acquisition).MemoryObjproperties:shm_offset(frommeta.address)shm_byte_length(fromget_size())New focused tests
tests/v1/distributed/test_shm_l1_pool.pycovering:NonGpuContextShmprepare/commit store/retrieve flow with mocked MQ.Original prompt
Goal
Add a shared-memory (SHM) based
NonGpuContextimplementation on top of the existingww20_PR_cpu_context_picklebranch. The existing ABC interface and protocol definitions MUST NOT be changed — only new code should be added.Existing Interface (DO NOT CHANGE)
The ABC in
lmcache/v1/multiprocess/non_gpu_context.pyalready defines:prepare_store(key, instance_id) -> list[torch.Tensor] | Nonecommit_store(key, instance_id, chunks) -> boolprepare_retrieve(key, instance_id) -> list[torch.Tensor] | Nonecommit_retrieve(key, instance_id) -> boolThe protocols in
lmcache/v1/multiprocess/protocols/engine.pyalready havePrepareStoreResponse(withcontext: dict) andPrepareRetrieveResponse(withsuccess,data,context).The server in
lmcache/v1/multiprocess/server.pyalready hasprepare_store,commit_store,prepare_retrieve,commit_retrievehandlers (pickle-only).Changes Required
1. New file:
lmcache/v1/multiprocess/non_gpu_context_shm.pyCreate
NonGpuContextShm(NonGpuContext)that:mmap(NOTmultiprocessing.shared_memory.SharedMemoryto avoid resource_tracker unlinking on worker exit)prepare_store: sendsPREPARE_STORERPC, parsesresponse.context["slots"]to create tensor views into shared memory, returns them as out-bufferscommit_store: sendsCOMMIT_STOREwith empty bytes (data already in SHM), notifies serverprepare_retrieve: sendsPREPARE_RETRIEVERPC, parsesresponse.context["slots"]to create tensor viewscommit_retrieve: sendsCOMMIT_RETRIEVEto release read locks on server_make_tensor_view(offset, length, shape, dtype_str) -> torch.Tensorusingtorch.frombuffer2. Update
lmcache/v1/multiprocess/non_gpu_context.pyUpdate
create_non_gpu_contextfactory to accept optionalshm_name: str = ""andpool_size: int = 0parameters. If both are provided and valid, returnNonGpuContextShm; otherwise returnNonGpuContextPickle.3. Add
RegisterNonGpuContextResponsetolmcache/v1/multiprocess/protocols/engine.pyAdd a dataclass:
Update the
REGISTER_KV_CACHE_NON_GPU_CONTEXTprotocol definition'sresponse_classfromNonetoRegisterNonGpuContextResponse.4. Update server:
lmcache/v1/multiprocess/server.pyregister_kv_cache_non_gpu_contextreturn type toRegisterNonGpuContextResponseself.storage_manager.get_shm_pool_info()(if available)prepare_storeto check if SHM is active; if so, resolve obj_keys, callstorage_manager.reserve_write, and return slot metadata (offset, length, shape, dtype) inPrepareStoreResponse.context["slots"]commit_store: ifcpu_datais empty bytes, treat as SHM mode (data already written), just callstorage_manager.finish_writeprepare_retrieve: if SHM active, return slot metadata inPrepareRetrieveResponse.context["slots"]instead of serialized datacommit_retrieve: if SHM active, release read locks viastorage_manager.finish_read_prefetched5. Update
lmcache/v1/multiprocess/transfer_context.pyIn
DataTransferContext.register:future.result(), parseRegisterNonGpuContextResponseto extractshm_nameandpool_sizecreate_non_gpu_context(..., shm_name=shm_name, pool_size=pool_size)6. SHM pool infrastructure in memory/storage managers
In
lmcache/v1/distributed/memory_manager.py:_check_shm_capacity(required_bytes)that checks/dev/shmfree space_unlink_stale_shm(shm_name)that removes stalelmcache_l1_pool_*segmentscreate_memory_allocator, ifconfig.shm_nameis set and not lazy mode, try to set up SHM (with graceful fallback to pickle on failure)shm_nametoMixedMemoryAllocatorconstructor (add as optional kwarg)get_shm_pool_info() -> dicttoMemoryManagerclassIn
lmcache/v1/distributed/config.py:shm_name: strfield toL1MemoryManagerConfigwith defaultf"lmcache_l1_pool_{os.getpid()}"In
lmcache/v1/distributed/storage_manager.py:get_shm_pool_info() -> dictdelegating to l1_managerunsafe_read(keys) -> tuple[list[ObjectKey], list[MemoryObj]]for SHM retrieve without re-lockingIn
lmcache/v1/distributed/l1_manager.py:get_shm_pool_info() -> dictdelegating to memory_managerIn
lmcache/v1/memory_management.py:shm_offset -> int(returnsself.meta.address) andshm_byte_length -> int(returnsself.get_size()) to theMemoryObjbase class7. Tests:
tests/v1/distributed/test_shm_l1_pool.pyAdd unit tests verifying:
Key Design Princip...
This pull request was created from Copilot chat.