Skip to content

Consolidate MPCacheEngine context state into unified registry in server.py#261

Closed
Copilot wants to merge 3 commits into
ww20_PR_cpu_context_picklefrom
copilot/consolidate-contexts-in-server
Closed

Consolidate MPCacheEngine context state into unified registry in server.py#261
Copilot wants to merge 3 commits into
ww20_PR_cpu_context_picklefrom
copilot/consolidate-contexts-in-server

Conversation

Copy link
Copy Markdown

Copilot AI commented May 15, 2026

MPCacheEngine kept GPU and non-CUDA registration state across four parallel dicts, which increased drift risk and complicated call sites. This change consolidates context tracking into a single typed registry in server.py while preserving the existing external gpu_contexts interface.

  • Unified context model

    • Added RegisteredContext dataclass:
      • model_name, world_size
      • gpu_context: GPUCacheContext | None
      • non_cuda_metadata: NonGpuContextMetadata | None
    • Added helpers:
      • is_gpu
      • get_layout_desc(chunk_size)
  • Single registry in MPCacheEngine

    • Replaced:
      • gpu_contexts (field)
      • gpu_context_meta
      • non_cuda_contexts
      • non_cuda_context_meta
    • With:
      • self.contexts: dict[int, RegisteredContext]
  • Backward compatibility surface preserved

    • Added @property gpu_contexts returning a derived GPU-only map from self.contexts, so existing getattr(engine, "gpu_contexts", None) usage remains valid without touching other files.
  • Usage-site refactor (server.py only)

    • Updated required methods to read/write through self.contexts:
      • register_kv_cache
      • register_kv_cache_non_gpu_context
      • unregister_kv_cache
      • store
      • retrieve
      • store_cpu_chunks
      • retrieve_cpu_chunks
      • _find_layout_desc
      • report_status
      • close
@dataclass
class RegisteredContext:
    model_name: str
    world_size: int
    gpu_context: GPUCacheContext | None = None
    non_cuda_metadata: NonGpuContextMetadata | None = None

@property
def gpu_contexts(self) -> dict[int, GPUCacheContext]:
    return {
        instance_id: ctx.gpu_context
        for instance_id, ctx in self.contexts.items()
        if ctx.gpu_context is not None
    }
Original prompt

Task

In lmcache/v1/multiprocess/server.py, the MPCacheEngine.__init__ currently has 4 separate dictionaries:

self.gpu_contexts: dict[int, GPUCacheContext] = {}
self.gpu_context_meta: dict[int, tuple[str, int]] = {}
self.non_cuda_contexts: dict[int, NonGpuContextMetadata] = {}
self.non_cuda_context_meta: dict[int, tuple[str, int]] = {}

Consolidate these into a single self.contexts: dict[int, RegisteredContext] registry by adding a small RegisteredContext dataclass, and updating all usage sites within server.py only.

Scope — ONLY server.py

Only modify lmcache/v1/multiprocess/server.py. Do NOT touch any other file. No blend_server_v2.py, no cache_api.py, no test files. Nothing else.

Important constraints

  • The attribute name self.gpu_contexts is referenced by cache_api.py via getattr(engine, "gpu_contexts", None). To maintain backward compatibility without touching that file, add a Python @property named gpu_contexts on MPCacheEngine that returns a dict view {id: ctx.gpu_context for id, ctx in self.contexts.items() if ctx.gpu_context is not None}. This preserves the existing external interface with zero changes outside server.py.
  • Keep all existing behavior identical — this is a pure internal refactor.
  • Do NOT change any wire protocol, message types, or public APIs.
  • Minimize diff size. Only change what's necessary for the consolidation.

What to do in server.py

  1. Add a RegisteredContext dataclass with fields: model_name: str, world_size: int, gpu_context: GPUCacheContext | None = None, non_cuda_metadata: NonGpuContextMetadata | None = None. Add an is_gpu property and a get_layout_desc(chunk_size) method.

  2. Replace the 4 dicts with self.contexts: dict[int, RegisteredContext] = {}.

  3. Add a @property gpu_contexts for backward compat (see above).

  4. Update register_kv_cache, register_kv_cache_non_gpu_context, unregister_kv_cache, store, retrieve, store_cpu_chunks, retrieve_cpu_chunks, _find_layout_desc, report_status, close to use self.contexts.

That's it. Nothing else.

The following is the prior conversation context from the user's chat exploration (may be truncated):

Assistant: Based on the PR #252 and the comment from @maobaolong, the reviewer is suggesting that instead of maintaining two separate dictionaries (gpu_context_meta and cpu_contexts), you should reuse the existing gpu_contexts dict (or rename it to a more generic name like contexts or transfer_contexts) to hold both GPU and CPU context metadata.

The comment is on code that looks like:

self.gpu_context_meta: dict[int, tuple[str, int]] = {}
self.cpu_contexts: dict[int, CPUContextMetadata] = {}

What the reviewer wants:

  1. Consolidate gpu_context_meta and cpu_contexts into a single dict (e.g., rename gpu_contextscontexts or transfer_contexts).
  2. The single dict should be able to hold metadata for both CUDA IPC and non-GPU context registrations, likely keyed by instance_id.
  3. This aligns with the TransferContext polymorphic abstraction introduced in this PR — since CudaTransferContext and NonCudaTransferContext are unified behind one interface on the worker side, the server side should similarly unify the metadata storage.

Suggested approach: Rename gpu_contexts to something generic (e.g., context_registry or transfer_contexts) and store a union type or a common base dataclass that covers both GPU (IPC handles) and non-GPU (scalar metadata) registration info. This avoids proliferating per-transport dicts every time a new transport is added.

User: 说中文

Assistant: 根据 PR #252@maobaolong 的评论,reviewer 的意思是:

现状: 代码中有两个独立的字典:

self.gpu_context_meta: dict[int, tuple[str, int]] = {}
self.cpu_contexts: dict[int, CPUContextMetadata] = {}

Reviewer 建议:

  1. 复用已有的 gpu_contexts 字典,不要新增 cpu_contexts
  2. gpu_contexts 重命名为更通用的名字(比如 contextstransfer_contexts),使其能同时存储 GPU 和 CPU 两种上下文的元数据。
  3. 这样做与 PR 中 worker 端引入的 TransferContext 多态抽象保持一致——worker 端已经统一了 CudaTransferContextNonCudaTransferContext,server 端的元数据存储也应该统一,避免每加一种传输方式就多一个字典。

建议做法: 定义一个通用的基类或联合类型,让 GPU(IPC handles)和非 GPU(标量元数据)的注册信息都能存到同一个字典里,按 instance_id 索引即可。

User: 你有什么建议

Assistant: 让我先看一下当前 gpu_contexts 的具体用法,再给出具体建议。

Assistant: 看完完整代码后,我的具体建议如下:

核心问题

目前 server.pyMPCacheEngine4 个字典来管理上下文:

self.gpu_contexts: dict[int, GPUCacheContext] = {}
self.gpu_context_meta: dict[int, tuple[str, int]] = {}
self.non_cuda_contexts: dict[int, NonGpuContextMetadata] = {}
self.non_cuda_context_meta: dict[int, tuple[str, int]] = {}

建议方案:统一为一个注册表

1. 定义一个通用的 dataclass

@dataclass
class RegisteredContext:
    """统一的上下文注册信息,涵盖 GPU 和 non-CUDA 两种模式。"""
    model_name: str
    world_size: int
    # GPU 模式下有值,non-CUDA 模式下为 None
    gpu_context: GPUCacheContext | None = None
    # non-CUDA 模式下有值,GPU 模式下为 None
    ...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

Copilot AI and others added 2 commits May 15, 2026 06:04
Copilot AI changed the title [WIP] Consolidate context dictionaries in MPCacheEngine Consolidate MPCacheEngine context state into unified registry in server.py May 15, 2026
Copilot AI requested a review from hlin99 May 15, 2026 06:10
Copilot finished work on behalf of hlin99 May 15, 2026 06:10
@hlin99 hlin99 closed this May 15, 2026
@hlin99 hlin99 deleted the copilot/consolidate-contexts-in-server branch May 28, 2026 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants