Migrate RMM usage to CCCL MR design#2996
Conversation
Remove device_memory_resource base class usage, de-template all resource and adaptor types, replace pointer-based per-device resource APIs with ref-based equivalents, and update all call sites for the new signatures. Part of rapidsai/rmm#2011.
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
I assume this can only go in after downstream library PRs are merged (cuvs/cuml/cugraph), right? |
5b787c0 to
fb41fae
Compare
| if (any_mr_) return new device_memory_resource(*any_mr_); | ||
| return new device_memory_resource(mr_); | ||
| } | ||
| auto make_resource() -> resource* override { return new device_memory_resource(any_mr_); } |
There was a problem hiding this comment.
We may want to use resources by value rather than by pointer now.
6beda83 to
725c510
Compare
This goes the other way. We must merge this PR before we can merge cuvs/cuml/cugraph. The merge order will be RMM, then this RAFT PR, then cuVS, then cuML/cuGraph. |
725c510 to
ad74660
Compare
ad74660 to
d7b1e04
Compare
divyegala
left a comment
There was a problem hiding this comment.
Approving with required fixes before merge.
| device_resources(const device_resources& handle, | ||
| std::shared_ptr<rmm::mr::device_memory_resource> workspace_resource, | ||
| std::optional<std::size_t> allocation_limit = std::nullopt) | ||
| : resources{handle} | ||
| { | ||
| // replace the resource factory for the workspace_resources | ||
| resource::set_workspace_resource(*this, workspace_resource, allocation_limit); | ||
| } |
| @@ -50,31 +45,6 @@ TEST(DeviceResourcesManager, ObeysSetters) | |||
|
|
|||
| // Provide lock for counting unique objects | |||
| auto mtx = std::mutex{}; | |||
| auto workspace_mrs = | |||
| } | ||
| return result; | ||
| }()}, | ||
| workspace_mr_{[¶ms, this]() { |
| // TODO: reinstate the dynamic_cast<cuda_memory_resource*> guard that | ||
| // skipped pool creation when a non-default resource was already set, | ||
| // once CCCL exposes resource_cast or an equivalent type-query. |
There was a problem hiding this comment.
I have marked this as a post-task for me to complete. There is a similar need in cuOpt. CCCL recently added the resource_cast feature to help with this, but I haven't had a chance to update RAPIDS to that version of CCCL yet. It should be possible to fix this in 1-2 weeks.
| auto* mr = dynamic_cast<rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource>*>( | ||
| rmm::mr::get_current_device_resource()); |
There was a problem hiding this comment.
This depends on the CCCL resource_cast feature. We can re-add this test later, as noted above.
Restore workspace_resource constructor parameters and copy constructors on device_resources and handle_t using raft::mr::device_resource instead of shared_ptr<device_memory_resource>. Re-add workspace MR plumbing in device_resources_manager (storage, setter, getter, pass-through). Add TODO for reinstating the dynamic_cast guard once CCCL exposes resource_cast.
4bdd570 to
f05df96
Compare
|
/merge |
## Summary - Remove dependency on `rmm::mr::device_memory_resource` base class; resources now satisfy the `cuda::mr::resource` concept directly - Replace `shared_ptr<device_memory_resource>` with value types and `cuda::mr::any_resource<cuda::mr::device_accessible>` for type-erased storage - Replace `set_current_device_resource(ptr)` / `set_per_device_resource(id, ptr)` with `set_current_device_resource_ref` / `set_per_device_resource_ref` - Remove `make_owning_wrapper` usage and `dynamic_cast` on memory resources (no common base class) - Add missing `thrust/iterator/transform_output_iterator.h` include (no longer transitively included via CCCL) Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. Depends on rapidsai/raft#2996.
## Summary - Migrate all RMM usage to the new CCCL memory resource design (de-templated resources, `device_async_resource_ref` instead of `device_memory_resource*`, value semantics) - Replace `get_workspace_resource()` / `get_large_workspace_resource()` with `_ref()` variants across 65 call sites - Rewrite `cuda_huge_page_resource` to satisfy CCCL `resource` concept directly - Remove `owning_wrapper` / `dynamic_cast` patterns in C API and benchmarks Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. Depends on rapidsai/raft#2996. ## Changes - **33 files changed** (~208 insertions, ~221 deletions) - `device_memory_resource*` params → `device_async_resource_ref` (ivf_common, ivf_pq, naive_knn) - `get_current_device_resource()` → `get_current_device_resource_ref()` - `set_current_device_resource()` → `set_current_device_resource_ref()` - De-templated `pool_memory_resource`, `failure_callback_resource_adaptor` in bench utils - Removed `&resource` pointer patterns (resources are now copyable value types) - Removed spurious `mr` arg from `select_k` calls (previously compiled due to implicit pointer→bool conversion) - C API pool resource management rewritten without `owning_wrapper` --------- Co-authored-by: gpuCI <38199262+GPUtester@users.noreply.github.com>
## Summary - Migrate all raw RMM `allocate`/`deallocate` calls to the new CCCL 3-argument API that requires explicit alignment - Replace removed `rmm.librmm.per_device_resource` Cython import with `rmm.pylibrmm.memory_resource` and use `make_any_device_resource` to obtain the resource for `device_buffer` construction Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. Depends on rapidsai/raft#2996. Depends on rapidsai/cuvs#1990. ## Changes - **`cpp/src/genetic/genetic.cu`**: Add explicit `alignof(node)` / `alignof(program)` to all `allocate` and `deallocate` calls in `parallel_evolve` and `symFit`; fix deallocation bug in `parallel_evolve` where `h_nextprogs[i].len` was incorrectly used instead of `tmp.len` to compute the buffer size being freed - **`cpp/examples/symreg/symreg_example.cpp`**: Use `params.population_size * sizeof(cg::program)` and `alignof(cg::program)` for `allocate`/`deallocate` calls, fixing incorrect byte-size computation; remove unused `<rmm/aligned.hpp>` include - **`cpp/tests/sg/genetic/evolution_test.cu`**: Add alignment arguments to allocate/deallocate in `SymReg` test - **`cpp/tests/sg/genetic/program_test.cu`**: Add alignment arguments to `SetUp`/`TearDown` allocate/deallocate calls - **`python/cuml/cuml/manifold/umap/umap.pyx`**: Replace `get_current_device_resource()` with `make_any_device_resource(get_current_device_resource().get_mr())` for `device_buffer` construction Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Simon Adorf (https://github.com/csadorf) - Divye Gala (https://github.com/divyegala) - Victor Lafargue (https://github.com/viclafargue) URL: #7951
## Summary - Replace removed `rmm::mr::device_memory_resource` base class, `owning_wrapper`, `shared_ptr`-based resource management, and deprecated per-device resource APIs with CCCL-native memory resource types - Use `cuda::mr::any_resource<cuda::mr::device_accessible>` for owning type-erased storage, `rmm::device_async_resource_ref` for non-owning references, and value-typed resources (`cuda_memory_resource`, `pinned_host_memory_resource`) - Pass the memory resource to `raft::handle_t` as the `workspace_resource` (3rd) constructor argument, matching the new raft API (`stream_view`, `stream_pool`, `std::optional<raft::mr::device_resource>`) Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. Depends on rapidsai/raft#2996. Depends on rapidsai/cuvs#1990. ## Files changed **Headers:** - `algorithms.hpp`, `dendrogram.hpp`, `legacy/graph.hpp`, `legacy/functions.hpp`: `get_current_device_resource()` → `get_current_device_resource_ref()` in default argument expressions - `host_staging_buffer_manager.hpp`: Remove `owning_wrapper`, store `pool_memory_resource` by value in a `std::optional`, accept `pinned_host_memory_resource` by value in `init()` - `large_buffer_manager.hpp`: Store `pinned_host_memory_resource` by value (not `shared_ptr`), return `device_async_resource_ref` from `get()`, `std::move` the resource into storage - `mtmg/resource_manager.hpp`: Use `cuda::mr::any_resource<device_accessible>` instead of `shared_ptr<device_memory_resource>` for `per_device_rmm_resources_`, use non-deprecated `set_per_device_resource`, pass resource as `workspace_resource` to `raft::handle_t` **Tests:** - `base_fixture.hpp`: Return `any_resource<device_accessible>` from `create_memory_resource()`, use value-typed MR factory helpers (`make_cuda`, `make_managed`, `make_pool`, `make_binning`), switch to non-deprecated `set_current_device_resource` / `get_current_device_resource_ref` - `multi_node_threaded_test.cpp`: Switch to non-deprecated `set_current_device_resource(resource)` - `mg_graph500_bfs_test.cu`, `mg_graph500_sssp_test.cu`: Store `pinned_mr_` as `optional<pinned_host_memory_resource>` by value, prefer `.value()` over `operator*` for optional access **Examples:** - All 4 example files (`sg_graph_algorithms.cpp`, `mg_graph_algorithms.cpp`, `vertex_and_edge_partition.cu`, `graph_operations.cu`): Use value-typed `cuda_memory_resource`, non-deprecated `set_current_device_resource`, pass the resource to `raft::handle_t` as the `workspace_resource` (3rd positional arg, with `nullptr` for the unused `stream_pool`) Authors: - Bradley Dice (https://github.com/bdice) - Chuck Hastings (https://github.com/ChuckHastings) Approvers: - Chuck Hastings (https://github.com/ChuckHastings) - Vyas Ramasubramani (https://github.com/vyasr) URL: #5483
PR #2996 migrated RAFT to the CCCL/RMM memory-resource model and temporarily removed the guard that prevented `device_resources_manager` from overwriting a user-installed RMM resource. Now that CCCL provides `cuda::mr::resource_cast` support for RMM resources, restore that behavior using `rmm::mr::get_current_device_resource_ref()`. With this change, `device_resources_manager` only creates and installs its pool when the current resource is the default `rmm::mr::cuda_memory_resource`. If another resource is already installed, RAFT preserves it and emits the existing warning. The test coverage removed during #2996 is restored with `resource_cast` against the current resource ref. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #3020
Summary
device_memory_resourcebase class usage, de-template all resource and adaptor types, replace pointer-based per-device resource APIs with ref-based equivalentsDepends on rapidsai/rmm#2361.
Depends on rapidsai/ucxx#636.
Changes
Core resource infrastructure
device_memory_resource.hpp: Removeany_resource_bridge(which inherited fromrmm::mr::device_memory_resource), remove allshared_ptr<device_memory_resource>constructor overloads, consolidate toany_resource-only pathdevice_resources.hpp: Remove deprecated constructor takingshared_ptr<device_memory_resource>, updateget_workspace_resource()return type (de-templatedlimiting_resource_adaptor)device_resources_snmg.hpp: Remove stale include, de-templatepool_memory_resourcehandle.hpp: Remove deprecated constructors takingshared_ptr<device_memory_resource>device_resources_manager.hpp: Retypeworkspace_mrsvector fromshared_ptr<device_memory_resource>toraft::mr::device_resource, updateset_workspace_memory_resource()signature accordingly, de-templatepool_mr_tooptional<pool_memory_resource>, removedynamic_castfor upstream type detection, replaceget/set_current_device_resource()with_refvariantsMemory tracking
memory_tracking_resources.hpp: Removedevice_tracking_bridge(inherited fromdevice_memory_resource), useset_current_device_resource_ref()directlyCall sites using
get_workspace_resource()→get_workspace_resource_ref()select_k-inl.cuh,select_radix.cuh,select_warpsort.cuh,sparse/select_k-inl.cuh,bitmap_to_csr.cuh,bitset_to_csr.cuhBenchmarks
benchmark.hpp: De-templatepool_memory_resource, useany_resourcefor RAII restoregather.cu,subsample.cu: Same patternTests
handle.cpp: Dereferencelimiting_resource_adaptor*fordevice_bufferconstructordevice_resources_manager.cpp: Remove workspace-related test code for removed APIsmdarray.cu: Removetest_device_resource_bridge_unwrap(bridge no longer exists)multi_variable_gaussian.cu:get_current_device_resource()→get_current_device_resource_ref()