Copilot/ww21 pr shm merge dev fix by hlin99 · Pull Request #303 · hlin99/LMCache

hlin99 · 2026-05-27T02:24:32Z

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

* docs: add recipes for phi, mistral, llama Signed-off-by: yeoshuheng <100367948+yeoshuheng@users.noreply.github.com> * docs: update tool calling link Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: sh <100367948+yeoshuheng@users.noreply.github.com> * docs: update jinja template fp Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: sh <100367948+yeoshuheng@users.noreply.github.com> * docs: update jinja template fp Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: sh <100367948+yeoshuheng@users.noreply.github.com> --------- Signed-off-by: yeoshuheng <100367948+yeoshuheng@users.noreply.github.com> Signed-off-by: sh <100367948+yeoshuheng@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

… thread (LMCache#3271) * Replace Condvar polling with eventfd+epoll for io_uring worker Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com> * Fix rust format check Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com> * Add RAII guard for file descriptor in rust raw block Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com> --------- Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com>

* nixl_storage: naive support for files + dynamic static is not actually very usable with files: we bump into the OS limits on the open files very quickly, limiting the size of the cache. With tons of VRAM, having G3 of comparable size is not helping much, we really need it to be much bigger. This commit adds support for files in dynamic mode of nixl storage. There are some naive things done: 1. No support for sharing the cache storage. We assume the worker has exclusive access to the files and it is always safe to overwrite existing files. Note that with TP!=1 we will have several workers with the same target directory, that's why it's important to have a worker id as part of the key, so they don't affect each other. 2. No eviction support. As before, dynamic mode has no evicting support. This kinda made sense when it was only working with OBJ storages, but now that is something to be aware of. It's not hard to add eviction though. 3. Flat directory with all the cache files. There are going to be a lot of them, especially provided we don't have eviction. Most filesystems don't optimize for the case of a directory with millions of files, and we constantly open/close files there, that might be a source of additional latency. - There is an existing PR to support sharding the directory, we can merge it as a band aid. - As a more advanced solution, we can do multi-layer subdirectory structure, like gds backend does. 4. We switched directly from having _all_ files open at once, to open/close _every_ single file on access. We might explore having an LRU cache of open files instead. Signed-off-by: Ilya Yanok <iyanok@nvidia.com> * Fix formatting Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix PR Comments Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Extract _build_descs Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix test Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Refactor to increase readability Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix another leak issue Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix release order Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Treat file not found as a miss Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Add docstring about file create mode 0o644 Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Unlink files on failure Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Add missing docs Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix pre-commit issues Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix mypy error Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix test Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Make tests pass Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix formatting Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix nixl tests and stop ignoring them in CI Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix path to support GDS Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Revert path creation in NixlDynamicStorageBackend Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix code quality issue Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> --------- Signed-off-by: Ilya Yanok <iyanok@nvidia.com> Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> Co-authored-by: Ilya Yanok <iyanok@nvidia.com>

* memory: huge page support (C bits) Add 3 pairs of alloc/free functions for huge pages. The 4th option is for SHM and it's missing, since normal SHM use cases don't support huge pages. Old API is left untouched, except for the small refactor: common mmap code is factored out. This doesn't affect the behavior. Signed-off-by: Ilya Yanok <iyanok@nvidia.com> * memory_management: use new functions for huge pages Change two allocators to support us_huge_pages flag. Signed-off-by: Ilya Yanok <iyanok@nvidia.com> * memory_management: log if allocating huge pages fail Signed-off-by: Ilya Yanok <iyanok@nvidia.com> * non_cuda_equivalents: change memory functions to match cuda Adding use_hugepages to alloc and size to free. There is no actual huge page support though. It should be pretty easy to add to non-shm cases (mmap and give torch a buffer). For shm we need to look into shared_memory module implementation. Signed-off-by: Ilya Yanok <iyanok@nvidia.com> * local_cpu: support for huge pages Signed-off-by: Ilya Yanok <iyanok@nvidia.com> * nixl_storage: add config option to alloc huge pages We want to be able to alloc huge pages for the NIXL buffer in DRAM. Signed-off-by: Ilya Yanok <iyanok@nvidia.com> * PR comments fixes, Code cleanup Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Skip test if not enough free huge pages Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix non-cuda fallbacks Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Add docs Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Remove unused func Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix hugepage info to check for 2 MiB only Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix pre-commit issues Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Merge branch 'dev' into iyanok/huge-pages Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> --------- Signed-off-by: Ilya Yanok <iyanok@nvidia.com> Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> Co-authored-by: Guy Ealey Morag <gealeymorag@nvidia.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>

) * [Add] refactoring for the LMCache MP cache engine Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>

yeoshuheng and others added 6 commits May 26, 2026 13:48

[MP][Core] Refactor MPCacheEngine for better extendability (LMCache#3391

bda3af1

) * [Add] refactoring for the LMCache MP cache engine Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>

Merge dev into ww21_PR_shm and resolve server/test conflicts

8171370

hlin99 closed this May 28, 2026

hlin99 deleted the copilot/ww21-pr-shm-merge-dev-fix branch May 28, 2026 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copilot/ww21 pr shm merge dev fix#303

Copilot/ww21 pr shm merge dev fix#303
hlin99 wants to merge 6 commits into
ww21_PR_shmfrom
copilot/ww21-pr-shm-merge-dev-fix

hlin99 commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

hlin99 commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants