Use multiprocessing.shared_memory for cross-platform SHM transport#291
Closed
Copilot wants to merge 6 commits into
Closed
Use multiprocessing.shared_memory for cross-platform SHM transport#291Copilot wants to merge 6 commits into
multiprocessing.shared_memory for cross-platform SHM transport#291Copilot wants to merge 6 commits into
Conversation
Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/5885bfad-28e5-4d4e-8ef5-c6fe1d4b89ae Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/5885bfad-28e5-4d4e-8ef5-c6fe1d4b89ae Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Agent-Logs-Url: https://github.com/hlin99/LMCache/sessions/5885bfad-28e5-4d4e-8ef5-c6fe1d4b89ae Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Replace Linux-specific SHM paths with multiprocessing.shared_memory
Use May 21, 2026
multiprocessing.shared_memory for cross-platform SHM transport
Workers do not own the SHM segment (server does), so we must prevent Python's resource_tracker from unlinking it when the worker exits. Without this, the second worker startup fails with FileNotFoundError.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The non-GPU SHM transport assumed Linux
/dev/shmpaths, which broke attachment and stale-segment cleanup on macOS. This updates the transport and allocator paths to use Python’s cross-platform shared memory API while preserving the existing Linux behavior.Worker SHM attachment
os.open("/dev/shm/...")+mmap.mmap(...)inNonGpuContextShmwithmultiprocessing.shared_memory.SharedMemory.torch.frombuffer(...).Stale SHM cleanup
/dev/shm/<name>unlink logic in_unlink_stale_shmwithSharedMemory(...).unlink().Allocator platform guard
/dev/shmfree-space check on Linux only./dev/shmmount.Small robustness follow-ups
Original prompt
Replace Linux-specific
/dev/shmhardcoded paths with cross-platformmultiprocessing.shared_memoryAPIs so the SHM transport works on both Linux and macOS.Do NOT modify any test files.
Changes needed:
1.
lmcache/v1/multiprocess/non_gpu_context_shm.pyReplace the
os.open("/dev/shm/...", os.O_RDWR)+mmap.mmap(fd, ...)pattern withmultiprocessing.shared_memory.SharedMemory:Remove the
os.open/os.close/mmap.mmaplogic and theINVALID_SHM_FDsentinel. Updateclose()to callself._shm.close()instead ofos.close(fd)+mmap.close().Note:
torch.frombufferaccepts amemoryview(which is whatSharedMemory.bufreturns), so the_make_tensor_viewmethod should still work withself._shm.bufas the buffer — just pass it directly totorch.frombuffer(self._shm.buf, ...).2.
lmcache/v1/distributed/memory_manager.py—_unlink_stale_shmReplace
os.unlink(os.path.join("/dev/shm", normalized))with:3.
lmcache/v1/distributed/memory_manager.py—create_memory_allocatorThe
shutil.disk_usage("/dev/shm")call only makes sense on Linux. Guard it:On non-Linux platforms, skip the capacity check (macOS shm is not backed by a fixed-size tmpfs).
IMPORTANT: Do NOT modify any files under
tests/.The following is the prior conversation context from the user's chat exploration (may be truncated):
Assistant:
需要我用这个创建 issue 或者直接提 PR?
User: 看下最新的@hlin99/LMCache/pull/290。我感觉shm-name参数穿进去好像prefix的name没有指定吧?
Assistant: 你说得对。看
memory_manager.py的改动:这里只改了局部变量
shm_name,但这个加了前缀的名字传到MixedMemoryAllocator后,注册响应返回给 worker 的shm_name是从storage_manager.get_shm_pool_info()取的。如果get_shm_pool_info()返回的还是原始的config.shm_name(没有前缀),那 worker 端拿到的名字就对不上实际创建的 SHM 段名。需要确认:
MixedMemoryAllocator内部记录的 shm_name 是传入的(已加前缀)还是原始的?get_shm_pool_info()返回的是哪个?如果
MixedMemoryAllocator只是透传构造参数里的 shm_name,那这里应该没问题——因为传进去的已经是加了前缀的。但config.shm_name本身没被修改,如果有其他地方直接读config.shm_name(比如_unlink_stale_shm的调用处),可能还是用的无前缀版本。要我查一下
MixedMemoryAllocator和get_shm_pool_info的实现确认?User: 你去查啊。hlin99:ww21_PR_shm分支,你还等我给你查吗
Assistant: 找到了。看第 109-115 行: