Copilot/ww21 pr shm merge dev fix#303
Closed
hlin99 wants to merge 6 commits into
Closed
Conversation
* docs: add recipes for phi, mistral, llama Signed-off-by: yeoshuheng <100367948+yeoshuheng@users.noreply.github.com> * docs: update tool calling link Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: sh <100367948+yeoshuheng@users.noreply.github.com> * docs: update jinja template fp Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: sh <100367948+yeoshuheng@users.noreply.github.com> * docs: update jinja template fp Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: sh <100367948+yeoshuheng@users.noreply.github.com> --------- Signed-off-by: yeoshuheng <100367948+yeoshuheng@users.noreply.github.com> Signed-off-by: sh <100367948+yeoshuheng@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
… thread (LMCache#3271) * Replace Condvar polling with eventfd+epoll for io_uring worker Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com> * Fix rust format check Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com> * Add RAII guard for file descriptor in rust raw block Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com> --------- Signed-off-by: zhengfeihe <hezhengfei1999@gmail.com>
* nixl_storage: naive support for files + dynamic
static is not actually very usable with files: we bump into the OS
limits on the open files very quickly, limiting the size of the cache.
With tons of VRAM, having G3 of comparable size is not helping much, we
really need it to be much bigger.
This commit adds support for files in dynamic mode of nixl storage.
There are some naive things done:
1. No support for sharing the cache storage. We assume the worker has
exclusive access to the files and it is always safe to overwrite
existing files. Note that with TP!=1 we will have several workers
with the same target directory, that's why it's important to have a
worker id as part of the key, so they don't affect each other.
2. No eviction support. As before, dynamic mode has no evicting support.
This kinda made sense when it was only working with OBJ storages, but
now that is something to be aware of. It's not hard to add eviction
though.
3. Flat directory with all the cache files. There are going to be a lot
of them, especially provided we don't have eviction. Most filesystems
don't optimize for the case of a directory with millions of files,
and we constantly open/close files there, that might be a source
of additional latency.
- There is an existing PR to support sharding the directory, we can
merge it as a band aid.
- As a more advanced solution, we can do multi-layer subdirectory
structure, like gds backend does.
4. We switched directly from having _all_ files open at once, to
open/close _every_ single file on access. We might explore having an
LRU cache of open files instead.
Signed-off-by: Ilya Yanok <iyanok@nvidia.com>
* Fix formatting
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Fix PR Comments
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Extract _build_descs
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Fix test
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Refactor to increase readability
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Fix another leak issue
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Fix release order
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Treat file not found as a miss
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Add docstring about file create mode 0o644
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Unlink files on failure
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Add missing docs
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Fix pre-commit issues
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Fix mypy error
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Fix test
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Make tests pass
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Fix formatting
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Fix nixl tests and stop ignoring them in CI
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Fix path to support GDS
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Revert path creation in NixlDynamicStorageBackend
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
* Fix code quality issue
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
---------
Signed-off-by: Ilya Yanok <iyanok@nvidia.com>
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
Co-authored-by: Ilya Yanok <iyanok@nvidia.com>
* memory: huge page support (C bits) Add 3 pairs of alloc/free functions for huge pages. The 4th option is for SHM and it's missing, since normal SHM use cases don't support huge pages. Old API is left untouched, except for the small refactor: common mmap code is factored out. This doesn't affect the behavior. Signed-off-by: Ilya Yanok <iyanok@nvidia.com> * memory_management: use new functions for huge pages Change two allocators to support us_huge_pages flag. Signed-off-by: Ilya Yanok <iyanok@nvidia.com> * memory_management: log if allocating huge pages fail Signed-off-by: Ilya Yanok <iyanok@nvidia.com> * non_cuda_equivalents: change memory functions to match cuda Adding use_hugepages to alloc and size to free. There is no actual huge page support though. It should be pretty easy to add to non-shm cases (mmap and give torch a buffer). For shm we need to look into shared_memory module implementation. Signed-off-by: Ilya Yanok <iyanok@nvidia.com> * local_cpu: support for huge pages Signed-off-by: Ilya Yanok <iyanok@nvidia.com> * nixl_storage: add config option to alloc huge pages We want to be able to alloc huge pages for the NIXL buffer in DRAM. Signed-off-by: Ilya Yanok <iyanok@nvidia.com> * PR comments fixes, Code cleanup Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Skip test if not enough free huge pages Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix non-cuda fallbacks Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Add docs Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Remove unused func Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix hugepage info to check for 2 MiB only Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Fix pre-commit issues Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> * Merge branch 'dev' into iyanok/huge-pages Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> --------- Signed-off-by: Ilya Yanok <iyanok@nvidia.com> Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com> Co-authored-by: Guy Ealey Morag <gealeymorag@nvidia.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
Special notes for your reviewers:
If applicable: