Skip to content

Pass MPServerConfig.shm_name into NonGPUTransferModule#308

Closed
Copilot wants to merge 4 commits into
ww21_PR_shmfrom
copilot/pass-mp-config-to-nongputransfermodule
Closed

Pass MPServerConfig.shm_name into NonGPUTransferModule#308
Copilot wants to merge 4 commits into
ww21_PR_shmfrom
copilot/pass-mp-config-to-nongputransfermodule

Conversation

Copy link
Copy Markdown

Copilot AI commented May 27, 2026

--shm-name was defined on MPServerConfig but never reached the non-GPU transfer path, so --shm-name "" could not disable SHM and force pickle mode. This change keeps run_cache_server() unchanged and routes the override directly into NonGPUTransferModule, where the SHM decision is made.

  • Server wiring

    • Update _build_modules() in lmcache/v1/multiprocess/server.py to construct NonGPUTransferModule(ctx, mp_config) for transfer_mode="non_gpu".
  • Non-GPU SHM override handling

    • Extend NonGPUTransferModule.__init__ to accept optional MPServerConfig.
    • Store mp_config.shm_name as a module-local override instead of mutating shared storage config.
    • Update _compute_shm_pool_info() to prefer the override when present:
      • None → preserve existing auto-detect behavior from storage config
      • "" → disable SHM and return pickle-mode metadata
      • custom name → normalize to lmcache_l1_pool_* and return the configured pool size
  • Regression coverage

    • Add focused tests covering:
      • _build_modules() passing the MP config into NonGPUTransferModule
      • empty-string override disabling SHM
      • explicit SHM-name override being normalized and returned during non-GPU context registration

Example of the new flow:

# server.py
else:
    modules.append(NonGPUTransferModule(ctx, mp_config))

# non_gpu_transfer.py
def __init__(
    self,
    ctx: MPCacheEngineContext,
    mp_config: MPServerConfig | None = None,
) -> None:
    self._ctx = ctx
    self._shm_name_override = (
        mp_config.shm_name if mp_config is not None else None
    )
Original prompt

Pass mp_config to NonGPUTransferModule so it can read shm_name directly, without changing run_cache_server() logic.

Changes needed:

1. lmcache/v1/multiprocess/server.py_build_modules()

Change:

else:
    modules.append(NonGPUTransferModule(ctx))

To:

else:
    modules.append(NonGPUTransferModule(ctx, mp_config))

2. lmcache/v1/multiprocess/modules/non_gpu_transfer.py

Update __init__ to accept mp_config:

from lmcache.v1.multiprocess.config import MPServerConfig

def __init__(self, ctx: MPCacheEngineContext, mp_config: MPServerConfig | None = None) -> None:
    self._ctx = ctx
    self._shm_name_override = mp_config.shm_name if mp_config else None
    self._non_gpu_contexts: dict[int, NonGPUContextEntry] = {}
    self._strategies: dict[int, TransferStrategy] = {}
    self._pending_shm_writes: dict[tuple[int, IPCCacheEngineKey], list[ObjectKey]] = {}
    self._pending_shm_reads: dict[tuple[int, IPCCacheEngineKey], list[ObjectKey]] = {}
    self._pending_shm_lock = threading.Lock()

Update _compute_shm_pool_info to use the override:

def _compute_shm_pool_info(self) -> ShmPoolInfo:
    """Compute SHM pool info from storage manager config."""
    if self._shm_name_override is not None:
        shm_name = self._shm_name_override
        if not shm_name:
            return {"shm_name": "", "pool_size": 0}
        sm_config = self._ctx.storage_manager_config
        mem_cfg = sm_config.l1_manager_config.memory_config
        bare = shm_name.lstrip("/")
        if not bare.startswith("lmcache_l1_pool_"):
            shm_name = f"lmcache_l1_pool_{bare}"
        return {"shm_name": shm_name, "pool_size": mem_cfg.size_in_bytes}
    sm_config = self._ctx.storage_manager_config
    mem_cfg = sm_config.l1_manager_config.memory_config
    shm_name = mem_cfg.shm_name or ""
    if not shm_name or mem_cfg.use_lazy:
        return {"shm_name": "", "pool_size": 0}
    bare = shm_name.lstrip("/")
    if not bare.startswith("lmcache_l1_pool_"):
        shm_name = f"lmcache_l1_pool_{bare}"
    return {"shm_name": shm_name, "pool_size": mem_cfg.size_in_bytes}

3. Ensure MPServerConfig has shm_name field

Check lmcache/v1/multiprocess/config.py — it should already have a shm_name field and --shm-name CLI arg. If not, add:

shm_name: str | None = None
"""Override SHM segment name. "" to force pickle, None for auto."""

And in add_mp_server_args:

parser.add_argument("--shm-name", type=str, default=None,
                    help='SHM segment name for non-GPU transfer. "" to disable SHM.')

Why:

--shm-name "" should disable SHM and force pickle mode. Currently the override never reaches _compute_shm_pool_info because nothing passes it from mp_config to the non-GPU module. This fix keeps the arg in MPServerConfig and lets NonGPUTransferModule consume it directly.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: 哪个文件,多少行

Assistant: 文件lmcache/v1/multiprocess/server.py
第 193 行之后、第 194 行之前插入:

    # Apply shm_name override from MP config.
    if mp_config.shm_name is not None:
        storage_manager_config.l1_manager_config.memory_config.shm_name = (
            mp_config.shm_name
        )

即在 maybe_initialize_trace_recorder(...)ctx = MPCacheEngineContext(...) 之间。

User: 可以不改server文件吗

Assistant: 可以。两个方案:

方案 A:改 engine_context.py——把 shm_name override 传入 context

# engine_context.py __init__
def __init__(self, storage_manager_config, chunk_size=256, hash_algorithm="blake3",
             shm_name_override: str | None = None):
    if shm_name_override is not None:
        storage_manager_config.l1_manager_config.memory_config.shm_name = shm_name_override
    ...

然后 server.py 调用处只需加个参数:ctx = MPCacheEngineContext(..., shm_name_override=mp_config.shm_name) — 但这还是改了 server.py 一行。

方案 B(完全不改 server.py):改 config.pyparse_args_to_config

--shm-name 从 MP server args 移到 storage manager args 里,让 parse_args_to_config() 直接设置 memory_config.shm_name。这样 storage_manager_config 构建时就已经是正确的值,server.py 完全不用动。

但这改变了 --shm-name 的归属——从 "MP server 参数" 变成 "storage manager 参数"。语义上其实也说得通,因为 shm_name 本来就是 L1 memory 的配置。


最省事的还是在 server.py 加那 3 行。你不想改 server.py 的原因是什么?如果是想减少 diff/冲突,方案 B 改动反而更大。

User: 这个shm是non cuda context才有用的,server.py是上层抽象

Assistant: 对,那放在 NonGPUTransferModule.__init__ 里最合理:

# modules/non_gpu_transfer.py __init__
def __init__(self, ctx: MPCacheEngineContext, shm_name_override: str | None = None) -> None:
    self._ctx = ctx
    if shm_name_override is not None:
        ctx.storage_manager_config.l1_manager_config.memory_config.shm_name = shm_name_override
    ...

然后 _build_modules 里:

# server.py _build_modules
else:
    modules.append(NonGPUTransferModule(ctx, shm_name_override=mp_config.shm_name))

这样 `run_cache_serve...

This pull request was created from Copilot chat.

Copilot AI changed the title [WIP] Pass mp_config to NonGPUTransferModule to access shm_name directly Pass MPServerConfig.shm_name into NonGPUTransferModule May 27, 2026
Copilot AI requested a review from hlin99 May 27, 2026 06:27
Copilot finished work on behalf of hlin99 May 27, 2026 06:27
…Manager

The allocator creates the SHM segment using the name from
L1MemoryManagerConfig.shm_name. Without this override the allocator
uses the default pid-based name while _compute_shm_pool_info returns
the user-specified name, causing workers to fail with FileNotFoundError.
@hlin99 hlin99 closed this May 28, 2026
@hlin99 hlin99 deleted the copilot/pass-mp-config-to-nongputransfermodule branch May 28, 2026 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants