Refactor REGISTER_KV_CACHE_NON_GPU_CONTEXT to use a single msgspec.Struct payload by Copilot · Pull Request #262 · hlin99/LMCache

Copilot · 2026-05-15T07:20:06Z

REGISTER_KV_CACHE_NON_GPU_CONTEXT used 8 bare positional types in payload_classes, making the protocol definition unreadable and sending 8 separate ZMQ frames per registration.

Changes

custom_types.py — New RegisterNonGpuContextPayload(msgspec.Struct) with all 8 typed, named fields.
protocols/engine.py — payload_classes reduced from [int, str, int, int, int, int, str, bool] to [RegisterNonGpuContextPayload]; wire format collapses from 8 frames → 1.
server.py — register_kv_cache_non_gpu_context now accepts payload: RegisterNonGpuContextPayload and accesses fields directly.
transfer_context.py — Client constructs the struct before sending.
Tests — Updated test_vllm_mp_adapter and test_non_cuda_context to use the struct.

Before/after protocol definition:

# Before — opaque, positional
"REGISTER_KV_CACHE_NON_GPU_CONTEXT": ProtocolDefinition(
    payload_classes=[int, str, int, int, int, int, str, bool],
    ...
)

# After — self-documenting, single frame
"REGISTER_KV_CACHE_NON_GPU_CONTEXT": ProtocolDefinition(
    payload_classes=[RegisterNonGpuContextPayload],
    ...
)

No backward-compatibility concerns; client and server are always co-deployed.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

download-r2.pytorch.org
- Triggering command: /usr/bin/pip pip install -q msgspec torch --index-url REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Original prompt

Task

The REGISTER_KV_CACHE_NON_GPU_CONTEXT protocol definition in lmcache/v1/multiprocess/protocols/engine.py currently uses a list of 8 bare types as payload_classes, which is unreadable:

"REGISTER_KV_CACHE_NON_GPU_CONTEXT": ProtocolDefinition(
    payload_classes=[
        int,   # instance_id
        str,   # model_name
        int,   # world_size
        int,   # block_size
        int,   # num_layers
        int,   # hidden_dim_size
        str,   # dtype_str
        bool,  # use_mla
    ],
    response_class=None,
    handler_type=HandlerType.SYNC,
),

Refactor this to use a single dataclass payload instead, so the wire format becomes 1 serialized frame instead of 8.

Requirements

Define a msgspec.Struct (not a Python dataclass — the MQ layer uses msgspec_encode/msgspec_decode for serialization, so the type must be msgspec-compatible) with the 8 fields:
- instance_id: int
- model_name: str
- world_size: int
- block_size: int
- num_layers: int
- hidden_dim_size: int
- dtype_str: str
- use_mla: bool
Put it in an appropriate location (e.g. custom_types.py where other payload types like IPCCacheEngineKey live, or in protocols/engine.py itself).
Update protocols/engine.py: change payload_classes from the 8-element list to [TheNewStruct].
Update server.py: change MPCacheEngine.register_kv_cache_non_gpu_context to accept a single struct parameter and unpack fields from it internally.
Update the client-side caller that sends this request — find where REGISTER_KV_CACHE_NON_GPU_CONTEXT is called (likely in cache_api.py or a non-GPU context module) and have it construct the struct before sending.
Verify that msgspec_encode / msgspec_decode in mq.py will work with the new struct type (it should, since msgspec.Struct is natively supported by msgspec).

Compatibility

No backward compatibility concerns — client and server are always deployed at the same version.

Scope

Keep changes minimal. Only touch what's needed for this one protocol message type.

The following is the prior conversation context from the user's chat exploration (may be truncated):

Assistant: 就改了 3 处：

1. 循环前多声明两个列表：

# 加了这两行
registered_gpu_ids: list[int] = []
registered_non_cuda_ids: list[int] = []

2. 循环里顺手收集 ID：

if context.gpu_context is not None:
    registered_gpu_ids.append(instance_id)  # 加了这行
    ...

if context.non_cuda_metadata is not None:
    registered_non_cuda_ids.append(instance_id)  # 加了这行
    ...

3. return 里用本地变量，不再额外调用：

# 之前（PR #261）：
"registered_gpu_ids": list(self.gpu_contexts.keys()),  # 触发 property，新建 dict
"registered_non_cuda_instance_ids": [                   # 又遍历一次 self.contexts
    instance_id
    for instance_id, context in self.contexts.items()
    if context.non_cuda_metadata is not None
],

# 之后：
"registered_gpu_ids": registered_gpu_ids,               # 直接用
"registered_non_cuda_instance_ids": registered_non_cuda_ids,  # 直接用

就这些，避免了 1 次 property 创建 dict + 1 次额外遍历。

User: "group_physical_block_sizes": ctx.group_physical_block_sizes,
"group_compress_ratios": ctx.group_compress_ratios,
这两个你漏了吗？

Assistant: 不是我漏的，原来 base 分支的 report_status 就没有这两个字段（看第 986-998 行）。

PR #261 的 diff 里加了这两行，是 Copilot agent 自己"画蛇添足"加的新字段：

"group_physical_block_sizes": ctx.group_physical_block_sizes,
"group_compress_ratios": ctx.group_compress_ratios,

原始代码里 kv_cache_layout 只有：num_layers, block_size, hidden_dim_sizes, dtype, is_mla, num_blocks, gpu_kv_format, gpu_kv_shape, gpu_kv_concrete_shape, attention_backend, cache_size_per_token。

另外还有个问题：原来的字段名是 "block_size": ctx.block_size，PR #261 改成了 "inference_engine_logical_block_size": ctx.lmcache_logical_chunk_size，这改了 API 返回的 key 名和值，也是多余的改动。

这几个都应该还原成原来的样子。

User: 你先看看https://github.com/hlin99/LMCache/tree/ww20_PR_cpu_context_pickle base，然后在比较261

Assistant: 你说得对，是我之前看错了。base 分支里 report_status 确实有这两个字段（第 1176-1177 行）：

"group_physical_block_sizes": ctx.group_physical_block_sizes,
"group_compress_ratios": ctx.group_compress_ratios,

还有 "inference_engine_logical_block_size" 也是 base 里就有的（第 1173-1175 行）。

所以 PR #261 是丢了这 3 个字段，不是加了。这是个 bug。

另外对比 base 还发现 PR #261 还丢了几个东西：

store_cpu_chunks / retrieve_cpu_chunks 方法 — base 有（第 357-448 行），PR Consolidate MPCacheEngine context state into unified registry in server.py #261 保留了但走了 self.contexts
register_kv_cache_non_gpu_context 方法 — base 有（第 288-333 行）
_resolve_obj_keys 方法 — base 有（第 335-354 行），PR Consolidate MPCacheEngine context state into unified registry in server.py #261 的 store/retrieve 却还在内联写
blocks_per_chunk 的计算 — base 用 gpu_context.kv_layer_groups_manager.inference_engine_logical_block_size，PR Consolidate MPCacheEngine context state into unified registry in server.py #261 用了 gpu_context.block_size（不一定等价）
store/retrieve 里 group_lmcache_chunk_size — base 用 gpu_context.get_physical_chunk_size(group_idx)，PR Consolidate MPCacheEngine context state into unified registry in server.py #261 直接传了 self.chunk_size（压缩场景会出错）
run_cache_server 里的 handler 注册 — base 有 REGISTER_KV_CACHE_NON_GPU_CONTEXT、`...

This pull request was created from Copilot chat.

…ruct payload Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/c1dd140f-9025-4272-8d3e-c42296c0181e Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>

Initial plan

fb4d8c2

Copilot AI assigned Copilot and hlin99 May 15, 2026

Copilot started work on behalf of hlin99 May 15, 2026 07:20 View session

Refactor REGISTER_KV_CACHE_NON_GPU_CONTEXT to use a single msgspec.St…

cc0cd74

…ruct payload Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/c1dd140f-9025-4272-8d3e-c42296c0181e Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Refactor payload classes to use msgspec.Struct~~ Refactor REGISTER_KV_CACHE_NON_GPU_CONTEXT to use a single msgspec.Struct payload May 15, 2026

Copilot finished work on behalf of hlin99 May 15, 2026 07:27

Copilot AI requested a review from hlin99 May 15, 2026 07:27

hlin99 closed this May 15, 2026

hlin99 deleted the copilot/refactor-register-kv-cache-protocol branch May 15, 2026 13:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor REGISTER_KV_CACHE_NON_GPU_CONTEXT to use a single msgspec.Struct payload#262

Refactor REGISTER_KV_CACHE_NON_GPU_CONTEXT to use a single msgspec.Struct payload#262
Copilot wants to merge 2 commits into
ww20_PR_cpu_context_picklefrom
copilot/refactor-register-kv-cache-protocol

Copilot AI commented May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

I tried to connect to the following addresses, but was blocked by firewall rules:

Task

Requirements

Compatibility

Scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented May 15, 2026 •

edited

Loading