Skip to content

Copilot/ww21 pr shm merge dev fix#303

Closed
hlin99 wants to merge 6 commits into
ww21_PR_shmfrom
copilot/ww21-pr-shm-merge-dev-fix
Closed

Copilot/ww21 pr shm merge dev fix#303
hlin99 wants to merge 6 commits into
ww21_PR_shmfrom
copilot/ww21-pr-shm-merge-dev-fix

Conversation

@hlin99
Copy link
Copy Markdown
Owner

@hlin99 hlin99 commented May 27, 2026

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

yeoshuheng and others added 6 commits May 26, 2026 13:48
* docs: add recipes for phi, mistral, llama

Signed-off-by: yeoshuheng <100367948+yeoshuheng@users.noreply.github.com>

* docs: update tool calling link

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: sh <100367948+yeoshuheng@users.noreply.github.com>

* docs: update jinja template fp

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: sh <100367948+yeoshuheng@users.noreply.github.com>

* docs: update jinja template fp

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: sh <100367948+yeoshuheng@users.noreply.github.com>

---------

Signed-off-by: yeoshuheng <100367948+yeoshuheng@users.noreply.github.com>
Signed-off-by: sh <100367948+yeoshuheng@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
… thread (LMCache#3271)

* Replace Condvar polling with eventfd+epoll for io_uring worker

Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com>

* Fix rust format check

Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com>

* Add RAII guard for file descriptor in rust raw block

Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com>

---------

Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com>
* nixl_storage: naive support for files + dynamic

static is not actually very usable with files: we bump into the OS
limits on the open files very quickly, limiting the size of the cache.
With tons of VRAM, having G3 of comparable size is not helping much, we
really need it to be much bigger.

This commit adds support for files in dynamic mode of nixl storage.
There are some naive things done:
1. No support for sharing the cache storage. We assume the worker has
   exclusive access to the files and it is always safe to overwrite
   existing files. Note that with TP!=1 we will have several workers
   with the same target directory, that's why it's important to have a
   worker id as part of the key, so they don't affect each other.
2. No eviction support. As before, dynamic mode has no evicting support.
   This kinda made sense when it was only working with OBJ storages, but
   now that is something to be aware of. It's not hard to add eviction
   though.
3. Flat directory with all the cache files. There are going to be a lot
   of them, especially provided we don't have eviction. Most filesystems
   don't optimize for the case of a directory with millions of files,
   and we constantly open/close files there, that might be a source
   of additional latency.
   - There is an existing PR to support sharding the directory, we can
     merge it as a band aid.
   - As a more advanced solution, we can do multi-layer subdirectory
     structure, like gds backend does.
4. We switched directly from having _all_ files open at once, to
   open/close _every_ single file on access. We might explore having an
   LRU cache of open files instead.

Signed-off-by: Ilya Yanok <iyanok@nvidia.com>

* Fix formatting

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Fix PR Comments

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Extract _build_descs

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Fix test

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Refactor to increase readability

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Fix another leak issue

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Fix release order

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Treat file not found as a miss

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Add docstring about file create mode 0o644

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Unlink files on failure

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Add missing docs

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Fix pre-commit issues

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Fix mypy error

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Fix test

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Make tests pass

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Fix formatting

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Fix nixl tests and stop ignoring them in CI

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Fix path to support GDS

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Revert path creation in NixlDynamicStorageBackend

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Fix code quality issue

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

---------

Signed-off-by: Ilya Yanok <iyanok@nvidia.com>
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
Co-authored-by: Ilya Yanok <iyanok@nvidia.com>
* memory: huge page support (C bits)

Add 3 pairs of alloc/free functions for huge pages.

The 4th option is for SHM and it's missing, since normal SHM use
cases don't support huge pages.

Old API is left untouched, except for the small refactor: common
mmap code is factored out. This doesn't affect the behavior.

Signed-off-by: Ilya Yanok <iyanok@nvidia.com>

* memory_management: use new functions for huge pages

Change two allocators to support us_huge_pages flag.

Signed-off-by: Ilya Yanok <iyanok@nvidia.com>

* memory_management: log if allocating huge pages fail

Signed-off-by: Ilya Yanok <iyanok@nvidia.com>

* non_cuda_equivalents: change memory functions to match cuda

Adding use_hugepages to alloc and size to free.

There is no actual huge page support though. It should be pretty easy
to add to non-shm cases (mmap and give torch a buffer). For shm we
need to look into shared_memory module implementation.

Signed-off-by: Ilya Yanok <iyanok@nvidia.com>

* local_cpu: support for huge pages

Signed-off-by: Ilya Yanok <iyanok@nvidia.com>

* nixl_storage: add config option to alloc huge pages

We want to be able to alloc huge pages for the NIXL buffer in DRAM.

Signed-off-by: Ilya Yanok <iyanok@nvidia.com>

* PR comments fixes, Code cleanup

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Skip test if not enough free huge pages

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Fix non-cuda fallbacks

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Add docs

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Remove unused func

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Fix hugepage info to check for 2 MiB only

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Fix pre-commit issues

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

* Merge branch 'dev' into iyanok/huge-pages

Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>

---------

Signed-off-by: Ilya Yanok <iyanok@nvidia.com>
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
Co-authored-by: Guy Ealey Morag <gealeymorag@nvidia.com>
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
)

* [Add] refactoring for the LMCache MP cache engine

Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>
@hlin99 hlin99 closed this May 28, 2026
@hlin99 hlin99 deleted the copilot/ww21-pr-shm-merge-dev-fix branch May 28, 2026 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants