Consolidate MPCacheEngine context state into unified registry in `server.py` by Copilot · Pull Request #261 · hlin99/LMCache

Copilot · 2026-05-15T05:57:41Z

MPCacheEngine kept GPU and non-CUDA registration state across four parallel dicts, which increased drift risk and complicated call sites. This change consolidates context tracking into a single typed registry in server.py while preserving the existing external gpu_contexts interface.

Unified context model
- Added RegisteredContext dataclass:
  - model_name, world_size
  - gpu_context: GPUCacheContext | None
  - non_cuda_metadata: NonGpuContextMetadata | None
- Added helpers:
  - is_gpu
  - get_layout_desc(chunk_size)
Single registry in MPCacheEngine
- Replaced:
  - gpu_contexts (field)
  - gpu_context_meta
  - non_cuda_contexts
  - non_cuda_context_meta
- With:
  - self.contexts: dict[int, RegisteredContext]
Backward compatibility surface preserved
- Added @property gpu_contexts returning a derived GPU-only map from self.contexts, so existing getattr(engine, "gpu_contexts", None) usage remains valid without touching other files.
Usage-site refactor (server.py only)
- Updated required methods to read/write through self.contexts:
  - register_kv_cache
  - register_kv_cache_non_gpu_context
  - unregister_kv_cache
  - store
  - retrieve
  - store_cpu_chunks
  - retrieve_cpu_chunks
  - _find_layout_desc
  - report_status
  - close

@dataclass
class RegisteredContext:
    model_name: str
    world_size: int
    gpu_context: GPUCacheContext | None = None
    non_cuda_metadata: NonGpuContextMetadata | None = None

@property
def gpu_contexts(self) -> dict[int, GPUCacheContext]:
    return {
        instance_id: ctx.gpu_context
        for instance_id, ctx in self.contexts.items()
        if ctx.gpu_context is not None
    }

Original prompt

Task

In lmcache/v1/multiprocess/server.py, the MPCacheEngine.__init__ currently has 4 separate dictionaries:

self.gpu_contexts: dict[int, GPUCacheContext] = {}
self.gpu_context_meta: dict[int, tuple[str, int]] = {}
self.non_cuda_contexts: dict[int, NonGpuContextMetadata] = {}
self.non_cuda_context_meta: dict[int, tuple[str, int]] = {}

Consolidate these into a single self.contexts: dict[int, RegisteredContext] registry by adding a small RegisteredContext dataclass, and updating all usage sites within server.py only.

Scope — ONLY `server.py`

Only modify lmcache/v1/multiprocess/server.py. Do NOT touch any other file. No blend_server_v2.py, no cache_api.py, no test files. Nothing else.

Important constraints

The attribute name self.gpu_contexts is referenced by cache_api.py via getattr(engine, "gpu_contexts", None). To maintain backward compatibility without touching that file, add a Python @property named gpu_contexts on MPCacheEngine that returns a dict view {id: ctx.gpu_context for id, ctx in self.contexts.items() if ctx.gpu_context is not None}. This preserves the existing external interface with zero changes outside server.py.
Keep all existing behavior identical — this is a pure internal refactor.
Do NOT change any wire protocol, message types, or public APIs.
Minimize diff size. Only change what's necessary for the consolidation.

What to do in server.py

Add a RegisteredContext dataclass with fields: model_name: str, world_size: int, gpu_context: GPUCacheContext | None = None, non_cuda_metadata: NonGpuContextMetadata | None = None. Add an is_gpu property and a get_layout_desc(chunk_size) method.
Replace the 4 dicts with self.contexts: dict[int, RegisteredContext] = {}.
Add a @property gpu_contexts for backward compat (see above).
Update register_kv_cache, register_kv_cache_non_gpu_context, unregister_kv_cache, store, retrieve, store_cpu_chunks, retrieve_cpu_chunks, _find_layout_desc, report_status, close to use self.contexts.

That's it. Nothing else.

The following is the prior conversation context from the user's chat exploration (may be truncated):

Assistant: Based on the PR #252 and the comment from @maobaolong, the reviewer is suggesting that instead of maintaining two separate dictionaries (gpu_context_meta and cpu_contexts), you should reuse the existing gpu_contexts dict (or rename it to a more generic name like contexts or transfer_contexts) to hold both GPU and CPU context metadata.

The comment is on code that looks like:

self.gpu_context_meta: dict[int, tuple[str, int]] = {}
self.cpu_contexts: dict[int, CPUContextMetadata] = {}

What the reviewer wants:

Consolidate gpu_context_meta and cpu_contexts into a single dict (e.g., rename gpu_contexts → contexts or transfer_contexts).
The single dict should be able to hold metadata for both CUDA IPC and non-GPU context registrations, likely keyed by instance_id.
This aligns with the TransferContext polymorphic abstraction introduced in this PR — since CudaTransferContext and NonCudaTransferContext are unified behind one interface on the worker side, the server side should similarly unify the metadata storage.

Suggested approach: Rename gpu_contexts to something generic (e.g., context_registry or transfer_contexts) and store a union type or a common base dataclass that covers both GPU (IPC handles) and non-GPU (scalar metadata) registration info. This avoids proliferating per-transport dicts every time a new transport is added.

User: 说中文

Assistant: 根据 PR #252 和 @maobaolong 的评论，reviewer 的意思是：

现状： 代码中有两个独立的字典：

self.gpu_context_meta: dict[int, tuple[str, int]] = {}
self.cpu_contexts: dict[int, CPUContextMetadata] = {}

Reviewer 建议：

复用已有的 gpu_contexts 字典，不要新增 cpu_contexts。
把 gpu_contexts 重命名为更通用的名字（比如 contexts、transfer_contexts），使其能同时存储 GPU 和 CPU 两种上下文的元数据。
这样做与 PR 中 worker 端引入的 TransferContext 多态抽象保持一致——worker 端已经统一了 CudaTransferContext 和 NonCudaTransferContext，server 端的元数据存储也应该统一，避免每加一种传输方式就多一个字典。

建议做法： 定义一个通用的基类或联合类型，让 GPU（IPC handles）和非 GPU（标量元数据）的注册信息都能存到同一个字典里，按 instance_id 索引即可。

User: 你有什么建议

Assistant: 让我先看一下当前 gpu_contexts 的具体用法，再给出具体建议。

Assistant: 看完完整代码后，我的具体建议如下：

核心问题

目前 server.py 的 MPCacheEngine 有 4 个字典来管理上下文：

self.gpu_contexts: dict[int, GPUCacheContext] = {}
self.gpu_context_meta: dict[int, tuple[str, int]] = {}
self.non_cuda_contexts: dict[int, NonGpuContextMetadata] = {}
self.non_cuda_context_meta: dict[int, tuple[str, int]] = {}

建议方案：统一为一个注册表

1. 定义一个通用的 dataclass

@dataclass
class RegisteredContext:
    """统一的上下文注册信息，涵盖 GPU 和 non-CUDA 两种模式。"""
    model_name: str
    world_size: int
    # GPU 模式下有值，non-CUDA 模式下为 None
    gpu_context: GPUCacheContext | None = None
    # non-CUDA 模式下有值，GPU 模式下为 None
    ...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

…ntext map Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/85db3bcf-5cf5-4b30-80d3-0464c83ff19c Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>

…tibility property Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/85db3bcf-5cf5-4b30-80d3-0464c83ff19c Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>

Initial plan

4eb93d5

Copilot AI assigned Copilot and hlin99 May 15, 2026

Copilot started work on behalf of hlin99 May 15, 2026 05:57 View session

Copilot AI and others added 2 commits May 15, 2026 06:04

refactor(mp): consolidate server context registries into RegisteredCo…

ccbe8d3

…ntext map Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/85db3bcf-5cf5-4b30-80d3-0464c83ff19c Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>

refactor(mp): finalize server-only unified context registry and compa…

beb301b

…tibility property Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/85db3bcf-5cf5-4b30-80d3-0464c83ff19c Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Consolidate context dictionaries in MPCacheEngine~~ Consolidate MPCacheEngine context state into unified registry in server.py May 15, 2026

Copilot AI requested a review from hlin99 May 15, 2026 06:10

Copilot finished work on behalf of hlin99 May 15, 2026 06:10

hlin99 closed this May 15, 2026

Copilot AI mentioned this pull request May 15, 2026

Refactor REGISTER_KV_CACHE_NON_GPU_CONTEXT to use a single msgspec.Struct payload #262

Closed

hlin99 deleted the copilot/consolidate-contexts-in-server branch May 28, 2026 12:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate MPCacheEngine context state into unified registry in `server.py`#261

Consolidate MPCacheEngine context state into unified registry in `server.py`#261
Copilot wants to merge 3 commits into
ww20_PR_cpu_context_picklefrom
copilot/consolidate-contexts-in-server

Copilot AI commented May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Task

Scope — ONLY server.py

Important constraints

What to do in server.py

核心问题

建议方案：统一为一个注册表

1. 定义一个通用的 dataclass

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented May 15, 2026 •

edited

Loading

Scope — ONLY `server.py`