Unify multiprocess context registries in MPCacheEngine and BlendEngineV2 by Copilot · Pull Request #260 · hlin99/LMCache

Copilot · 2026-05-15T05:31:49Z

This refactor removes per-transport context bookkeeping in the multiprocess server path by consolidating GPU and non-CUDA registrations into unified registries. It keeps external behavior/protocols intact while simplifying context lifecycle, layout lookup, and status reporting paths.

Server context model: replace 4 dicts with one typed registry
- Added RegisteredContext in lmcache/v1/multiprocess/server.py:
  - model_name, world_size
  - gpu_context: GPUCacheContext | None
  - non_cuda_metadata: NonGpuContextMetadata | None
  - is_gpu property
  - get_layout_desc(chunk_size) dispatcher
- Replaced:
  - gpu_contexts, gpu_context_meta, non_cuda_contexts, non_cuda_context_meta
  - with contexts: dict[int, RegisteredContext]
MPCacheEngine call-site migration
- Updated registration/unregistration to read/write self.contexts.
- Updated GPU paths (store, retrieve) to use local registered_context, gpu_context, model_name.
- Updated non-GPU paths (store_cpu_chunks, retrieve_cpu_chunks) to use non_cuda_metadata.
- Simplified _find_layout_desc to one pass over self.contexts.values().
- Unified report_status iteration while preserving existing status payload structure (registered_gpu_ids, gpu_context_meta, registered_non_cuda_instance_ids, non_cuda_context_meta).
Blend server consolidation
- In lmcache/v1/multiprocess/blend_server_v2.py, replaced _cb_gpu_contexts + _cb_gpu_context_meta with _cb_contexts: dict[int, _CBRegisteredContext].
- Migrated CB register/unregister, layout selection in lookup, CB store/retrieve/final paths, and CB status reporting to the unified CB registry.
HTTP API compatibility for unified registry
- In lmcache/v1/multiprocess/http_apis/cache_api.py, added fallback to derive GPU contexts from engine.contexts when gpu_contexts is absent (for unified-registry engines), with filtering for entries that do not expose a GPU context.
Focused test updates
- Extended tests/v1/multiprocess/test_http_server.py with fallback coverage for engine.contexts:
  - accepts valid GPU-backed entries
  - ignores entries with gpu_context=None

@dataclass
class RegisteredContext:
    model_name: str
    world_size: int
    gpu_context: GPUCacheContext | None = None
    non_cuda_metadata: NonGpuContextMetadata | None = None

    @property
    def is_gpu(self) -> bool:
        return self.gpu_context is not None

    def get_layout_desc(self, chunk_size: int) -> MemoryLayoutDesc:
        if self.gpu_context is not None:
            return get_layout_desc(self.gpu_context, chunk_size)
        return self.non_cuda_metadata.layout_desc

Original prompt

Background

In lmcache/v1/multiprocess/server.py, the MPCacheEngine currently uses 4 separate dictionaries to track registered worker contexts:

self.gpu_contexts: dict[int, GPUCacheContext] = {}
self.gpu_context_meta: dict[int, tuple[str, int]] = {}
self.non_cuda_contexts: dict[int, NonGpuContextMetadata] = {}
self.non_cuda_context_meta: dict[int, tuple[str, int]] = {}

This was flagged in a PR review comment by @maobaolong suggesting to reuse gpu_contexts (renamed to a more generic name) instead of proliferating per-transport dicts.

Task

Refactor to use a single unified registry by:

Add a RegisteredContext dataclass (in server.py or a small helper) with fields:
- model_name: str
- world_size: int
- gpu_context: GPUCacheContext | None = None (set for CUDA IPC registrations)
- non_cuda_metadata: NonGpuContextMetadata | None = None (set for non-GPU context registrations)
- Property is_gpu -> bool returning self.gpu_context is not None
- Method get_layout_desc(chunk_size: int) -> MemoryLayoutDesc that dispatches to get_layout_desc(self.gpu_context, chunk_size) for GPU or self.non_cuda_metadata.layout_desc for non-CUDA.

Replace the 4 dicts with:

self.contexts: dict[int, RegisteredContext] = {}

Update all usage sites in MPCacheEngine:
- register_kv_cache: store into self.contexts[instance_id]
- register_kv_cache_non_gpu_context: store into self.contexts[instance_id]
- unregister_kv_cache: pop from self.contexts; call torch_dev.empty_cache() only if ctx.is_gpu
- store and retrieve: access via self.contexts[instance_id].gpu_context and self.contexts[instance_id].model_name (take local variables at method start for readability)
- store_cpu_chunks and retrieve_cpu_chunks: access via self.contexts[instance_id].non_cuda_metadata
- _find_layout_desc: simplify to iterate self.contexts.values() once, call ctx.get_layout_desc(self.chunk_size) on match
- report_status: unify the iteration over self.contexts, outputting GPU and non-CUDA info based on ctx.is_gpu
Also update blend_server_v2.py if it has similar _cb_gpu_contexts / _cb_gpu_context_meta patterns — apply the same consolidation.
Keep all existing behavior identical — this is a pure internal refactor with no interface or protocol changes.

Key files to modify

lmcache/v1/multiprocess/server.py (main changes)
lmcache/v1/multiprocess/blend_server_v2.py (if applicable, similar consolidation)

Notes

The base branch is ww20_PR_cpu_context_pickle.
Do NOT change any wire protocol, message types, or public APIs.
All existing tests should continue to pass without modification.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: ```

We assume that if the (model name, world size) is the same, then

    # the layout desc returned by the gpu context is the same.
    self.gpu_context_meta: dict[int, tuple[str, int]] = {}
    self.cpu_contexts: dict[int, CPUContextMetadata] = {}

Collaborator
@maobaolong
maobaolong
13 hours ago
Suggest to reuse the gpu_contexts, and feel free to rename gpu_contexts to a more generic name.

hlin99:ww20_PR_cpu_context_pickle 有个comments如下。

Assistant: Based on the PR #252 and the comment from @maobaolong, the reviewer is suggesting that instead of maintaining **two separate dictionaries** (`gpu_context_meta` and `cpu_contexts`), you should **reuse the existing `gpu_contexts` dict** (or rename it to a more generic name like `contexts` or `transfer_contexts`) to hold both GPU and CPU context metadata.

The comment is on code that looks like:

```python
self.gpu_context_meta: dict[int, tuple[str, int]] = {}
self.cpu_contexts: dict[int, CPUContextMetadata] = {}

What the reviewer wants:

Consolidate gpu_context_meta and cpu_contexts into a single dict (e.g., rename gpu_contexts → contexts or transfer_contexts).
The single dict should be able to hold metadata for both CUDA IPC and non-GPU context registrations, likely keyed by instance_id.
This aligns with the TransferContext polymorphic abstraction introduced in this PR — since CudaTransferContext and NonCudaTransferContext are unified behind one interface on the worker side, the server side should similarly unify the metadata storage.

Suggested approach: Rename gpu_contexts to something generic (e.g., context_registry or transfer_contexts) and store a union type or a common base dataclass that covers both GPU (IPC handles) and non-GPU (scalar metadata) registration info. This avoids proliferating per-transport dicts every time a new transport is added.

User: 说中文

Assistant: 根据 PR #252 和 @maobaolong 的评论，reviewer 的意思是：

现状： 代码中有两个独立的字典：

self.gpu_context_meta: dict[int, tuple[str, int]] = {}
self.cpu_contexts: dict[int, CPUContext...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/1778bac4-9474-4391-9ed4-611344f5567f Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>

Initial plan

8de6bb8

Copilot AI assigned Copilot and hlin99 May 15, 2026

Copilot started work on behalf of hlin99 May 15, 2026 05:31 View session

Copilot AI and others added 2 commits May 15, 2026 05:42

refactor: unify multiprocess context registries

57922ba

Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/1778bac4-9474-4391-9ed4-611344f5567f Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>

chore: address review feedback on unified context registry

59f44d3

Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/1778bac4-9474-4391-9ed4-611344f5567f Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Refactor to use a single unified registry for contexts~~ Unify multiprocess context registries in MPCacheEngine and BlendEngineV2 May 15, 2026

Copilot AI requested a review from hlin99 May 15, 2026 05:48

Copilot finished work on behalf of hlin99 May 15, 2026 05:48

hlin99 closed this May 15, 2026

hlin99 deleted the copilot/refactor-unified-context-registry branch May 15, 2026 13:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify multiprocess context registries in MPCacheEngine and BlendEngineV2#260

Unify multiprocess context registries in MPCacheEngine and BlendEngineV2#260
Copilot wants to merge 3 commits into
ww20_PR_cpu_context_picklefrom
copilot/refactor-unified-context-registry

Copilot AI commented May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Task

Key files to modify

Notes

We assume that if the (model name, world size) is the same, then

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented May 15, 2026 •

edited

Loading