Skip to content

Refactor REGISTER_KV_CACHE_NON_GPU_CONTEXT to use a single msgspec.Struct payload#262

Closed
Copilot wants to merge 2 commits into
ww20_PR_cpu_context_picklefrom
copilot/refactor-register-kv-cache-protocol
Closed

Refactor REGISTER_KV_CACHE_NON_GPU_CONTEXT to use a single msgspec.Struct payload#262
Copilot wants to merge 2 commits into
ww20_PR_cpu_context_picklefrom
copilot/refactor-register-kv-cache-protocol

Conversation

Copy link
Copy Markdown

Copilot AI commented May 15, 2026

REGISTER_KV_CACHE_NON_GPU_CONTEXT used 8 bare positional types in payload_classes, making the protocol definition unreadable and sending 8 separate ZMQ frames per registration.

Changes

  • custom_types.py — New RegisterNonGpuContextPayload(msgspec.Struct) with all 8 typed, named fields.
  • protocols/engine.pypayload_classes reduced from [int, str, int, int, int, int, str, bool] to [RegisterNonGpuContextPayload]; wire format collapses from 8 frames → 1.
  • server.pyregister_kv_cache_non_gpu_context now accepts payload: RegisterNonGpuContextPayload and accesses fields directly.
  • transfer_context.py — Client constructs the struct before sending.
  • Tests — Updated test_vllm_mp_adapter and test_non_cuda_context to use the struct.

Before/after protocol definition:

# Before — opaque, positional
"REGISTER_KV_CACHE_NON_GPU_CONTEXT": ProtocolDefinition(
    payload_classes=[int, str, int, int, int, int, str, bool],
    ...
)

# After — self-documenting, single frame
"REGISTER_KV_CACHE_NON_GPU_CONTEXT": ProtocolDefinition(
    payload_classes=[RegisterNonGpuContextPayload],
    ...
)

No backward-compatibility concerns; client and server are always co-deployed.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • download-r2.pytorch.org
    • Triggering command: /usr/bin/pip pip install -q msgspec torch --index-url REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Task

The REGISTER_KV_CACHE_NON_GPU_CONTEXT protocol definition in lmcache/v1/multiprocess/protocols/engine.py currently uses a list of 8 bare types as payload_classes, which is unreadable:

"REGISTER_KV_CACHE_NON_GPU_CONTEXT": ProtocolDefinition(
    payload_classes=[
        int,   # instance_id
        str,   # model_name
        int,   # world_size
        int,   # block_size
        int,   # num_layers
        int,   # hidden_dim_size
        str,   # dtype_str
        bool,  # use_mla
    ],
    response_class=None,
    handler_type=HandlerType.SYNC,
),

Refactor this to use a single dataclass payload instead, so the wire format becomes 1 serialized frame instead of 8.

Requirements

  1. Define a msgspec.Struct (not a Python dataclass — the MQ layer uses msgspec_encode/msgspec_decode for serialization, so the type must be msgspec-compatible) with the 8 fields:

    • instance_id: int
    • model_name: str
    • world_size: int
    • block_size: int
    • num_layers: int
    • hidden_dim_size: int
    • dtype_str: str
    • use_mla: bool

    Put it in an appropriate location (e.g. custom_types.py where other payload types like IPCCacheEngineKey live, or in protocols/engine.py itself).

  2. Update protocols/engine.py: change payload_classes from the 8-element list to [TheNewStruct].

  3. Update server.py: change MPCacheEngine.register_kv_cache_non_gpu_context to accept a single struct parameter and unpack fields from it internally.

  4. Update the client-side caller that sends this request — find where REGISTER_KV_CACHE_NON_GPU_CONTEXT is called (likely in cache_api.py or a non-GPU context module) and have it construct the struct before sending.

  5. Verify that msgspec_encode / msgspec_decode in mq.py will work with the new struct type (it should, since msgspec.Struct is natively supported by msgspec).

Compatibility

No backward compatibility concerns — client and server are always deployed at the same version.

Scope

Keep changes minimal. Only touch what's needed for this one protocol message type.

The following is the prior conversation context from the user's chat exploration (may be truncated):

Assistant: 就改了 3 处:

1. 循环前多声明两个列表:

# 加了这两行
registered_gpu_ids: list[int] = []
registered_non_cuda_ids: list[int] = []

2. 循环里顺手收集 ID:

if context.gpu_context is not None:
    registered_gpu_ids.append(instance_id)  # 加了这行
    ...

if context.non_cuda_metadata is not None:
    registered_non_cuda_ids.append(instance_id)  # 加了这行
    ...

3. return 里用本地变量,不再额外调用:

# 之前(PR #261):
"registered_gpu_ids": list(self.gpu_contexts.keys()),  # 触发 property,新建 dict
"registered_non_cuda_instance_ids": [                   # 又遍历一次 self.contexts
    instance_id
    for instance_id, context in self.contexts.items()
    if context.non_cuda_metadata is not None
],

# 之后:
"registered_gpu_ids": registered_gpu_ids,               # 直接用
"registered_non_cuda_instance_ids": registered_non_cuda_ids,  # 直接用

就这些,避免了 1 次 property 创建 dict + 1 次额外遍历。

User: "group_physical_block_sizes": ctx.group_physical_block_sizes,
"group_compress_ratios": ctx.group_compress_ratios,
这两个你漏了吗?

Assistant: 不是我漏的,原来 base 分支的 report_status 就没有这两个字段(看第 986-998 行)。

PR #261 的 diff 里加了这两行,是 Copilot agent 自己"画蛇添足"加的新字段:

"group_physical_block_sizes": ctx.group_physical_block_sizes,
"group_compress_ratios": ctx.group_compress_ratios,

原始代码里 kv_cache_layout 只有:num_layers, block_size, hidden_dim_sizes, dtype, is_mla, num_blocks, gpu_kv_format, gpu_kv_shape, gpu_kv_concrete_shape, attention_backend, cache_size_per_token

另外还有个问题:原来的字段名是 "block_size": ctx.block_size,PR #261 改成了 "inference_engine_logical_block_size": ctx.lmcache_logical_chunk_size,这改了 API 返回的 key 名和值,也是多余的改动。

这几个都应该还原成原来的样子。

User: 你先看看https://github.com/hlin99/LMCache/tree/ww20_PR_cpu_context_pickle base,然后在比较261

Assistant: 你说得对,是我之前看错了。base 分支里 report_status 确实有这两个字段(第 1176-1177 行):

"group_physical_block_sizes": ctx.group_physical_block_sizes,
"group_compress_ratios": ctx.group_compress_ratios,

还有 "inference_engine_logical_block_size" 也是 base 里就有的(第 1173-1175 行)。

所以 PR #261 是丢了这 3 个字段,不是加了。这是个 bug。

另外对比 base 还发现 PR #261 还丢了几个东西:

  1. store_cpu_chunks / retrieve_cpu_chunks 方法 — base 有(第 357-448 行),PR Consolidate MPCacheEngine context state into unified registry in server.py #261 保留了但走了 self.contexts
  2. register_kv_cache_non_gpu_context 方法 — base 有(第 288-333 行)
  3. _resolve_obj_keys 方法 — base 有(第 335-354 行),PR Consolidate MPCacheEngine context state into unified registry in server.py #261store/retrieve 却还在内联写
  4. blocks_per_chunk 的计算 — base 用 gpu_context.kv_layer_groups_manager.inference_engine_logical_block_size,PR Consolidate MPCacheEngine context state into unified registry in server.py #261 用了 gpu_context.block_size(不一定等价)
  5. store/retrievegroup_lmcache_chunk_size — base 用 gpu_context.get_physical_chunk_size(group_idx),PR Consolidate MPCacheEngine context state into unified registry in server.py #261 直接传了 self.chunk_size(压缩场景会出错)
  6. run_cache_server 里的 handler 注册 — base 有 REGISTER_KV_CACHE_NON_GPU_CONTEXT、`...

This pull request was created from Copilot chat.

Copilot AI changed the title [WIP] Refactor payload classes to use msgspec.Struct Refactor REGISTER_KV_CACHE_NON_GPU_CONTEXT to use a single msgspec.Struct payload May 15, 2026
Copilot finished work on behalf of hlin99 May 15, 2026 07:27
Copilot AI requested a review from hlin99 May 15, 2026 07:27
@hlin99 hlin99 closed this May 15, 2026
@hlin99 hlin99 deleted the copilot/refactor-register-kv-cache-protocol branch May 15, 2026 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants