Consolidate MPCacheEngine context state into unified registry in server.py#261
Closed
Copilot wants to merge 3 commits into
Closed
Consolidate MPCacheEngine context state into unified registry in server.py#261Copilot wants to merge 3 commits into
server.py#261Copilot wants to merge 3 commits into
Conversation
…ntext map Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/85db3bcf-5cf5-4b30-80d3-0464c83ff19c Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
…tibility property Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/85db3bcf-5cf5-4b30-80d3-0464c83ff19c Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Consolidate context dictionaries in MPCacheEngine
Consolidate MPCacheEngine context state into unified registry in May 15, 2026
server.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
MPCacheEnginekept GPU and non-CUDA registration state across four parallel dicts, which increased drift risk and complicated call sites. This change consolidates context tracking into a single typed registry inserver.pywhile preserving the existing externalgpu_contextsinterface.Unified context model
RegisteredContextdataclass:model_name,world_sizegpu_context: GPUCacheContext | Nonenon_cuda_metadata: NonGpuContextMetadata | Noneis_gpuget_layout_desc(chunk_size)Single registry in
MPCacheEnginegpu_contexts(field)gpu_context_metanon_cuda_contextsnon_cuda_context_metaself.contexts: dict[int, RegisteredContext]Backward compatibility surface preserved
@property gpu_contextsreturning a derived GPU-only map fromself.contexts, so existinggetattr(engine, "gpu_contexts", None)usage remains valid without touching other files.Usage-site refactor (server.py only)
self.contexts:register_kv_cacheregister_kv_cache_non_gpu_contextunregister_kv_cachestoreretrievestore_cpu_chunksretrieve_cpu_chunks_find_layout_descreport_statuscloseOriginal prompt
Task
In
lmcache/v1/multiprocess/server.py, theMPCacheEngine.__init__currently has 4 separate dictionaries:Consolidate these into a single
self.contexts: dict[int, RegisteredContext]registry by adding a smallRegisteredContextdataclass, and updating all usage sites withinserver.pyonly.Scope — ONLY
server.pyOnly modify
lmcache/v1/multiprocess/server.py. Do NOT touch any other file. Noblend_server_v2.py, nocache_api.py, no test files. Nothing else.Important constraints
self.gpu_contextsis referenced bycache_api.pyviagetattr(engine, "gpu_contexts", None). To maintain backward compatibility without touching that file, add a Python@propertynamedgpu_contextsonMPCacheEnginethat returns a dict view{id: ctx.gpu_context for id, ctx in self.contexts.items() if ctx.gpu_context is not None}. This preserves the existing external interface with zero changes outsideserver.py.What to do in server.py
Add a
RegisteredContextdataclass with fields:model_name: str,world_size: int,gpu_context: GPUCacheContext | None = None,non_cuda_metadata: NonGpuContextMetadata | None = None. Add anis_gpuproperty and aget_layout_desc(chunk_size)method.Replace the 4 dicts with
self.contexts: dict[int, RegisteredContext] = {}.Add a
@property gpu_contextsfor backward compat (see above).Update
register_kv_cache,register_kv_cache_non_gpu_context,unregister_kv_cache,store,retrieve,store_cpu_chunks,retrieve_cpu_chunks,_find_layout_desc,report_status,closeto useself.contexts.That's it. Nothing else.
The following is the prior conversation context from the user's chat exploration (may be truncated):
Assistant: Based on the PR #252 and the comment from @maobaolong, the reviewer is suggesting that instead of maintaining two separate dictionaries (
gpu_context_metaandcpu_contexts), you should reuse the existinggpu_contextsdict (or rename it to a more generic name likecontextsortransfer_contexts) to hold both GPU and CPU context metadata.The comment is on code that looks like:
What the reviewer wants:
gpu_context_metaandcpu_contextsinto a single dict (e.g., renamegpu_contexts→contextsortransfer_contexts).instance_id.TransferContextpolymorphic abstraction introduced in this PR — sinceCudaTransferContextandNonCudaTransferContextare unified behind one interface on the worker side, the server side should similarly unify the metadata storage.Suggested approach: Rename
gpu_contextsto something generic (e.g.,context_registryortransfer_contexts) and store a union type or a common base dataclass that covers both GPU (IPC handles) and non-GPU (scalar metadata) registration info. This avoids proliferating per-transport dicts every time a new transport is added.User: 说中文
Assistant: 根据 PR #252 和 @maobaolong 的评论,reviewer 的意思是:
现状: 代码中有两个独立的字典:
Reviewer 建议:
gpu_contexts字典,不要新增cpu_contexts。gpu_contexts重命名为更通用的名字(比如contexts、transfer_contexts),使其能同时存储 GPU 和 CPU 两种上下文的元数据。TransferContext多态抽象保持一致——worker 端已经统一了CudaTransferContext和NonCudaTransferContext,server 端的元数据存储也应该统一,避免每加一种传输方式就多一个字典。建议做法: 定义一个通用的基类或联合类型,让 GPU(IPC handles)和非 GPU(标量元数据)的注册信息都能存到同一个字典里,按
instance_id索引即可。User: 你有什么建议
Assistant: 让我先看一下当前
gpu_contexts的具体用法,再给出具体建议。Assistant: 看完完整代码后,我的具体建议如下:
核心问题
目前
server.py的MPCacheEngine有 4 个字典来管理上下文:建议方案:统一为一个注册表
1. 定义一个通用的 dataclass