Skip to content

Move --shm-name from MPServerConfig to L1MemoryManagerConfig#310

Closed
Copilot wants to merge 2 commits into
ww21_PR_shmfrom
copilot/move-shm-name-to-l1memorymanagerconfig
Closed

Move --shm-name from MPServerConfig to L1MemoryManagerConfig#310
Copilot wants to merge 2 commits into
ww21_PR_shmfrom
copilot/move-shm-name-to-l1memorymanagerconfig

Conversation

Copy link
Copy Markdown

Copilot AI commented May 27, 2026

--shm-name was parsed into MPServerConfig.shm_name but never propagated to L1MemoryManagerConfig.shm_name before StorageManager was created. Result: the allocator created the SHM segment under the default pid-based name while _compute_shm_pool_info returned the user-specified name — worker got FileNotFoundError.

Changes

  • lmcache/v1/multiprocess/config.py: Remove shm_name field from MPServerConfig, remove --shm-name from add_mp_server_args, remove shm_name=args.shm_name from parse_args_to_mp_server_config.

  • lmcache/v1/distributed/config.py: Add --shm-name to add_storage_manager_args (L1 Memory Manager group). In parse_args_to_config, pass it directly into L1MemoryManagerConfig:

    shm_name = getattr(args, "shm_name", None)
    memory_config = L1MemoryManagerConfig(
        size_in_bytes=int(args.l1_size_gb * (1 << 30)),
        use_lazy=args.l1_use_lazy,
        init_size_in_bytes=int(args.l1_init_size_gb * (1 << 30)),
        align_bytes=args.l1_align_bytes,
        **({"shm_name": shm_name} if shm_name is not None else {}),
    )

    The conditional unpacking is intentional: L1MemoryManagerConfig.shm_name uses a default_factory for the pid-based name; passing None would suppress it. Empty string "" explicitly disables SHM; a name uses that segment.

The CLI invocation is unchanged — --shm-name moves transparently from the MP Server arg group to the L1 Memory Manager arg group.

Original prompt

Move --shm-name from MPServerConfig / add_mp_server_args to L1MemoryManagerConfig / add_storage_manager_args so that the allocator directly uses the user-specified SHM name.

Current problem

--shm-name is parsed into MPServerConfig.shm_name but never written into L1MemoryManagerConfig.shm_name before the StorageManager is created. The allocator creates a segment with the default pid-based name (lmcache_l1_pool_{pid}) while _compute_shm_pool_info returns the user-specified name → worker gets FileNotFoundError.

Required changes

  1. lmcache/v1/multiprocess/config.py: Remove --shm-name from add_mp_server_args and remove shm_name field from MPServerConfig. Remove shm_name=args.shm_name from parse_args_to_mp_server_config.

  2. lmcache/v1/distributed/config.py:

    • Add --shm-name argument to add_storage_manager_args in the "L1 Memory Manager" group:
      memory_group.add_argument(
          "--shm-name",
          type=str,
          default=None,
          help="POSIX shared-memory segment name for L1 pool. "
          "Default (not specified): auto-allocate pid-based name. "
          'Set to "" to disable SHM. '
          "Set to a name to use that specific SHM segment.",
      )
    • In parse_args_to_config, pass shm_name into L1MemoryManagerConfig when provided:
      shm_name = getattr(args, "shm_name", None)
      memory_config = L1MemoryManagerConfig(
          size_in_bytes=int(args.l1_size_gb * (1 << 30)),
          use_lazy=args.l1_use_lazy,
          init_size_in_bytes=int(args.l1_init_size_gb * (1 << 30)),
          align_bytes=args.l1_align_bytes,
          **({"shm_name": shm_name} if shm_name is not None else {}),
      )
  3. lmcache/v1/multiprocess/modules/non_gpu_transfer.py: Remove the _shm_name_override logic from NonGPUTransferModule.__init__ and the override branch in _compute_shm_pool_info. Revert __init__ signature to just (self, ctx: MPCacheEngineContext) (no mp_config parameter). Remove the MPServerConfig import.

  4. lmcache/v1/multiprocess/server.py:

    • Revert NonGPUTransferModule(ctx, mp_config) back to NonGPUTransferModule(ctx).
    • Remove the if mp_config.shm_name is not None: override block if present.
  5. Update tests in tests/v1/multiprocess/test_non_cuda_data_transfer.py: Remove or update tests that reference MPServerConfig(shm_name=...) in NonGPUTransferModule construction. The tests for _compute_shm_pool_info should now just set storage_manager_config.l1_manager_config.memory_config.shm_name directly.

Branch

Base branch: ww21_PR_shm

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: 怎么又改server,能不在这层改吗

Assistant: 改在 parse_args_to_config 那层,让 --shm-name 直接写进 L1MemoryManagerConfig.shm_name

config.py 第 294-299 行,parse_args_to_config 构造 L1MemoryManagerConfig 时没传 shm_name。但 --shm-name 是在 mp_server_args 里定义的,不在 storage_manager_args 里。

最干净的改法:把 shm_name 加到 add_storage_manager_args 里,parse_args_to_config 直接传进去。

但如果你不想改 storage_manager 的 arg 定义(因为它是通用的),那就改 MPServerConfig 的调用方 —— 在 parse_args_to_mp_server_config 之后、创建 context 之前。不管怎样都得在 server.pyhttp_server.py 的入口处做。

或者,改 MPCacheEngineContext.__init__,让它接收 shm_name_override 参数:

class MPCacheEngineContext:
    def __init__(
        self,
        storage_manager_config: StorageManagerConfig,
        chunk_size: int = 256,
        hash_algorithm: str = "blake3",
        shm_name_override: str | None = None,
    ) -> None:
        self._chunk_size = chunk_size
        self._storage_manager_config = storage_manager_config

        # Apply shm_name override before creating StorageManager
        if shm_name_override is not None:
            storage_manager_config.l1_manager_config.memory_config.shm_name = (
                shm_name_override
            )

        self._storage_manager = StorageManager(storage_manager_config)
        self._token_hasher = TokenHasher(
            chunk_size=chunk_size, hash_algorithm=hash_algorithm
        )
        self._session_manager = SessionManager(self._token_hasher)
        self._event_bus = get_event_bus()
        self._layout_desc_registry = LayoutDescRegistry()

然后 server.py 只改传参:

    ctx = MPCacheEngineContext(
        storage_manager_config=storage_manager_config,
        chunk_size=mp_config.chunk_size,
        hash_algorithm=mp_config.hash_algorithm,
        shm_name_override=mp_config.shm_name,
    )

这样 server.py 只多传一个参数,逻辑全在 engine_context.py 里。你觉得行吗?

User: @hlin99/LMCache/pull/308 这个PR不是已经做了吗

Assistant: 你说得对,PR #308 确实改了 _compute_shm_pool_info 让它返回 override 的名字。

但问题是这只改了"告诉 worker 用什么名字",没改"allocator 实际用什么名字创建 segment"。

PR #308 的改法:

  • _compute_shm_pool_info 返回 lmcache_l1_pool_hlin
  • StorageManager 创建时 config 里的 shm_name 还是默...

This pull request was created from Copilot chat.

- Remove shm_name field and --shm-name arg from MPServerConfig/add_mp_server_args
- Add --shm-name to add_storage_manager_args (L1 Memory Manager group)
- Pass shm_name into L1MemoryManagerConfig in parse_args_to_config
- non_gpu_transfer.py and server.py were already clean (no override logic needed)
Copilot AI changed the title [WIP] Move --shm-name to L1MemoryManagerConfig Move --shm-name from MPServerConfig to L1MemoryManagerConfig May 27, 2026
Copilot finished work on behalf of hlin99 May 27, 2026 08:00
Copilot AI requested a review from hlin99 May 27, 2026 08:00
@hlin99 hlin99 closed this May 28, 2026
@hlin99 hlin99 deleted the copilot/move-shm-name-to-l1memorymanagerconfig branch May 28, 2026 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants