Unify multiprocess context registries in MPCacheEngine and BlendEngineV2#260
Closed
Copilot wants to merge 3 commits into
Closed
Unify multiprocess context registries in MPCacheEngine and BlendEngineV2#260Copilot wants to merge 3 commits into
Copilot wants to merge 3 commits into
Conversation
Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/1778bac4-9474-4391-9ed4-611344f5567f Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/1778bac4-9474-4391-9ed4-611344f5567f Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Refactor to use a single unified registry for contexts
Unify multiprocess context registries in MPCacheEngine and BlendEngineV2
May 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This refactor removes per-transport context bookkeeping in the multiprocess server path by consolidating GPU and non-CUDA registrations into unified registries. It keeps external behavior/protocols intact while simplifying context lifecycle, layout lookup, and status reporting paths.
Server context model: replace 4 dicts with one typed registry
RegisteredContextinlmcache/v1/multiprocess/server.py:model_name,world_sizegpu_context: GPUCacheContext | Nonenon_cuda_metadata: NonGpuContextMetadata | Noneis_gpupropertyget_layout_desc(chunk_size)dispatchergpu_contexts,gpu_context_meta,non_cuda_contexts,non_cuda_context_metacontexts: dict[int, RegisteredContext]MPCacheEngine call-site migration
self.contexts.store,retrieve) to use localregistered_context,gpu_context,model_name.store_cpu_chunks,retrieve_cpu_chunks) to usenon_cuda_metadata._find_layout_descto one pass overself.contexts.values().report_statusiteration while preserving existing status payload structure (registered_gpu_ids,gpu_context_meta,registered_non_cuda_instance_ids,non_cuda_context_meta).Blend server consolidation
lmcache/v1/multiprocess/blend_server_v2.py, replaced_cb_gpu_contexts+_cb_gpu_context_metawith_cb_contexts: dict[int, _CBRegisteredContext].HTTP API compatibility for unified registry
lmcache/v1/multiprocess/http_apis/cache_api.py, added fallback to derive GPU contexts fromengine.contextswhengpu_contextsis absent (for unified-registry engines), with filtering for entries that do not expose a GPU context.Focused test updates
tests/v1/multiprocess/test_http_server.pywith fallback coverage forengine.contexts:gpu_context=NoneOriginal prompt
Background
In
lmcache/v1/multiprocess/server.py, theMPCacheEnginecurrently uses 4 separate dictionaries to track registered worker contexts:This was flagged in a PR review comment by @maobaolong suggesting to reuse
gpu_contexts(renamed to a more generic name) instead of proliferating per-transport dicts.Task
Refactor to use a single unified registry by:
Add a
RegisteredContextdataclass (inserver.pyor a small helper) with fields:model_name: strworld_size: intgpu_context: GPUCacheContext | None = None(set for CUDA IPC registrations)non_cuda_metadata: NonGpuContextMetadata | None = None(set for non-GPU context registrations)is_gpu -> boolreturningself.gpu_context is not Noneget_layout_desc(chunk_size: int) -> MemoryLayoutDescthat dispatches toget_layout_desc(self.gpu_context, chunk_size)for GPU orself.non_cuda_metadata.layout_descfor non-CUDA.Replace the 4 dicts with:
Update all usage sites in
MPCacheEngine:register_kv_cache: store intoself.contexts[instance_id]register_kv_cache_non_gpu_context: store intoself.contexts[instance_id]unregister_kv_cache: pop fromself.contexts; calltorch_dev.empty_cache()only ifctx.is_gpustoreandretrieve: access viaself.contexts[instance_id].gpu_contextandself.contexts[instance_id].model_name(take local variables at method start for readability)store_cpu_chunksandretrieve_cpu_chunks: access viaself.contexts[instance_id].non_cuda_metadata_find_layout_desc: simplify to iterateself.contexts.values()once, callctx.get_layout_desc(self.chunk_size)on matchreport_status: unify the iteration overself.contexts, outputting GPU and non-CUDA info based onctx.is_gpuAlso update
blend_server_v2.pyif it has similar_cb_gpu_contexts/_cb_gpu_context_metapatterns — apply the same consolidation.Keep all existing behavior identical — this is a pure internal refactor with no interface or protocol changes.
Key files to modify
lmcache/v1/multiprocess/server.py(main changes)lmcache/v1/multiprocess/blend_server_v2.py(if applicable, similar consolidation)Notes
ww20_PR_cpu_context_pickle.The following is the prior conversation context from the user's chat exploration (may be truncated):
User: ```
We assume that if the (model name, world size) is the same, then
Collaborator
@maobaolong
maobaolong
13 hours ago
Suggest to reuse the gpu_contexts, and feel free to rename gpu_contexts to a more generic name.
What the reviewer wants:
gpu_context_metaandcpu_contextsinto a single dict (e.g., renamegpu_contexts→contextsortransfer_contexts).instance_id.TransferContextpolymorphic abstraction introduced in this PR — sinceCudaTransferContextandNonCudaTransferContextare unified behind one interface on the worker side, the server side should similarly unify the metadata storage.Suggested approach: Rename
gpu_contextsto something generic (e.g.,context_registryortransfer_contexts) and store a union type or a common base dataclass that covers both GPU (IPC handles) and non-GPU (scalar metadata) registration info. This avoids proliferating per-transport dicts every time a new transport is added.User: 说中文
Assistant: 根据 PR #252 和 @maobaolong 的评论,reviewer 的意思是:
现状: 代码中有两个独立的字典: