Refactor REGISTER_KV_CACHE_NON_GPU_CONTEXT to use a single msgspec.Struct payload#262
Closed
Copilot wants to merge 2 commits into
Closed
Conversation
…ruct payload Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/c1dd140f-9025-4272-8d3e-c42296c0181e Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Refactor payload classes to use msgspec.Struct
Refactor REGISTER_KV_CACHE_NON_GPU_CONTEXT to use a single msgspec.Struct payload
May 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
REGISTER_KV_CACHE_NON_GPU_CONTEXTused 8 bare positional types inpayload_classes, making the protocol definition unreadable and sending 8 separate ZMQ frames per registration.Changes
custom_types.py— NewRegisterNonGpuContextPayload(msgspec.Struct)with all 8 typed, named fields.protocols/engine.py—payload_classesreduced from[int, str, int, int, int, int, str, bool]to[RegisterNonGpuContextPayload]; wire format collapses from 8 frames → 1.server.py—register_kv_cache_non_gpu_contextnow acceptspayload: RegisterNonGpuContextPayloadand accesses fields directly.transfer_context.py— Client constructs the struct before sending.test_vllm_mp_adapterandtest_non_cuda_contextto use the struct.Before/after protocol definition:
No backward-compatibility concerns; client and server are always co-deployed.
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
download-r2.pytorch.org/usr/bin/pip pip install -q msgspec torch --index-url REDACTED(dns block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt
Task
The
REGISTER_KV_CACHE_NON_GPU_CONTEXTprotocol definition inlmcache/v1/multiprocess/protocols/engine.pycurrently uses a list of 8 bare types aspayload_classes, which is unreadable:Refactor this to use a single dataclass payload instead, so the wire format becomes 1 serialized frame instead of 8.
Requirements
Define a
msgspec.Struct(not a Python dataclass — the MQ layer usesmsgspec_encode/msgspec_decodefor serialization, so the type must be msgspec-compatible) with the 8 fields:instance_id: intmodel_name: strworld_size: intblock_size: intnum_layers: inthidden_dim_size: intdtype_str: struse_mla: boolPut it in an appropriate location (e.g.
custom_types.pywhere other payload types likeIPCCacheEngineKeylive, or inprotocols/engine.pyitself).Update
protocols/engine.py: changepayload_classesfrom the 8-element list to[TheNewStruct].Update
server.py: changeMPCacheEngine.register_kv_cache_non_gpu_contextto accept a single struct parameter and unpack fields from it internally.Update the client-side caller that sends this request — find where
REGISTER_KV_CACHE_NON_GPU_CONTEXTis called (likely incache_api.pyor a non-GPU context module) and have it construct the struct before sending.Verify that
msgspec_encode/msgspec_decodeinmq.pywill work with the new struct type (it should, sincemsgspec.Structis natively supported by msgspec).Compatibility
No backward compatibility concerns — client and server are always deployed at the same version.
Scope
Keep changes minimal. Only touch what's needed for this one protocol message type.
The following is the prior conversation context from the user's chat exploration (may be truncated):
Assistant: 就改了 3 处:
1. 循环前多声明两个列表:
2. 循环里顺手收集 ID:
3. return 里用本地变量,不再额外调用:
就这些,避免了 1 次 property 创建 dict + 1 次额外遍历。
User: "group_physical_block_sizes": ctx.group_physical_block_sizes,
"group_compress_ratios": ctx.group_compress_ratios,
这两个你漏了吗?
Assistant: 不是我漏的,原来 base 分支的
report_status就没有这两个字段(看第 986-998 行)。PR #261 的 diff 里加了这两行,是 Copilot agent 自己"画蛇添足"加的新字段:
原始代码里
kv_cache_layout只有:num_layers,block_size,hidden_dim_sizes,dtype,is_mla,num_blocks,gpu_kv_format,gpu_kv_shape,gpu_kv_concrete_shape,attention_backend,cache_size_per_token。另外还有个问题:原来的字段名是
"block_size": ctx.block_size,PR #261 改成了"inference_engine_logical_block_size": ctx.lmcache_logical_chunk_size,这改了 API 返回的 key 名和值,也是多余的改动。这几个都应该还原成原来的样子。
User: 你先看看https://github.com/hlin99/LMCache/tree/ww20_PR_cpu_context_pickle base,然后在比较261
Assistant: 你说得对,是我之前看错了。base 分支里
report_status确实有这两个字段(第 1176-1177 行):还有
"inference_engine_logical_block_size"也是 base 里就有的(第 1173-1175 行)。所以 PR #261 是丢了这 3 个字段,不是加了。这是个 bug。
另外对比 base 还发现 PR #261 还丢了几个东西:
store_cpu_chunks/retrieve_cpu_chunks方法 — base 有(第 357-448 行),PR Consolidate MPCacheEngine context state into unified registry inserver.py#261 保留了但走了self.contextsregister_kv_cache_non_gpu_context方法 — base 有(第 288-333 行)_resolve_obj_keys方法 — base 有(第 335-354 行),PR Consolidate MPCacheEngine context state into unified registry inserver.py#261 的store/retrieve却还在内联写blocks_per_chunk的计算 — base 用gpu_context.kv_layer_groups_manager.inference_engine_logical_block_size,PR Consolidate MPCacheEngine context state into unified registry inserver.py#261 用了gpu_context.block_size(不一定等价)store/retrieve里group_lmcache_chunk_size— base 用gpu_context.get_physical_chunk_size(group_idx),PR Consolidate MPCacheEngine context state into unified registry inserver.py#261 直接传了self.chunk_size(压缩场景会出错)run_cache_server里的 handler 注册 — base 有REGISTER_KV_CACHE_NON_GPU_CONTEXT、`...This pull request was created from Copilot chat.