Skip to content

Migrate RMM usage to CCCL MR design#2996

Merged
rapids-bot[bot] merged 5 commits into
rapidsai:mainfrom
bdice:rmm-cccl-migration
Apr 21, 2026
Merged

Migrate RMM usage to CCCL MR design#2996
rapids-bot[bot] merged 5 commits into
rapidsai:mainfrom
bdice:rmm-cccl-migration

Conversation

@bdice
Copy link
Copy Markdown
Contributor

@bdice bdice commented Apr 3, 2026

Summary

Depends on rapidsai/rmm#2361.
Depends on rapidsai/ucxx#636.

Changes

Core resource infrastructure

  • device_memory_resource.hpp: Remove any_resource_bridge (which inherited from rmm::mr::device_memory_resource), remove all shared_ptr<device_memory_resource> constructor overloads, consolidate to any_resource-only path
  • device_resources.hpp: Remove deprecated constructor taking shared_ptr<device_memory_resource>, update get_workspace_resource() return type (de-templated limiting_resource_adaptor)
  • device_resources_snmg.hpp: Remove stale include, de-template pool_memory_resource
  • handle.hpp: Remove deprecated constructors taking shared_ptr<device_memory_resource>
  • device_resources_manager.hpp: Retype workspace_mrs vector from shared_ptr<device_memory_resource> to raft::mr::device_resource, update set_workspace_memory_resource() signature accordingly, de-template pool_mr_ to optional<pool_memory_resource>, remove dynamic_cast for upstream type detection, replace get/set_current_device_resource() with _ref variants

Memory tracking

  • memory_tracking_resources.hpp: Remove device_tracking_bridge (inherited from device_memory_resource), use set_current_device_resource_ref() directly

Call sites using get_workspace_resource()get_workspace_resource_ref()

  • select_k-inl.cuh, select_radix.cuh, select_warpsort.cuh, sparse/select_k-inl.cuh, bitmap_to_csr.cuh, bitset_to_csr.cuh

Benchmarks

  • benchmark.hpp: De-template pool_memory_resource, use any_resource for RAII restore
  • gather.cu, subsample.cu: Same pattern

Tests

  • handle.cpp: Dereference limiting_resource_adaptor* for device_buffer constructor
  • device_resources_manager.cpp: Remove workspace-related test code for removed APIs
  • mdarray.cu: Remove test_device_resource_bridge_unwrap (bridge no longer exists)
  • multi_variable_gaussian.cu: get_current_device_resource()get_current_device_resource_ref()

Remove device_memory_resource base class usage, de-template all resource
and adaptor types, replace pointer-based per-device resource APIs with
ref-based equivalents, and update all call sites for the new signatures.

Part of rapidsai/rmm#2011.
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 3, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@achirkin
Copy link
Copy Markdown
Contributor

achirkin commented Apr 8, 2026

I assume this can only go in after downstream library PRs are merged (cuvs/cuml/cugraph), right?

@bdice bdice force-pushed the rmm-cccl-migration branch 5 times, most recently from 5b787c0 to fb41fae Compare April 17, 2026 07:29
if (any_mr_) return new device_memory_resource(*any_mr_);
return new device_memory_resource(mr_);
}
auto make_resource() -> resource* override { return new device_memory_resource(any_mr_); }
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to use resources by value rather than by pointer now.

@bdice bdice marked this pull request as ready for review April 17, 2026 07:30
@bdice bdice requested review from a team as code owners April 17, 2026 07:30
@bdice bdice requested review from AyodeAwe and removed request for AyodeAwe April 17, 2026 07:30
@bdice bdice force-pushed the rmm-cccl-migration branch 3 times, most recently from 6beda83 to 725c510 Compare April 17, 2026 20:51
@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Apr 17, 2026

I assume this can only go in after downstream library PRs are merged (cuvs/cuml/cugraph), right?

This goes the other way. We must merge this PR before we can merge cuvs/cuml/cugraph.

The merge order will be RMM, then this RAFT PR, then cuVS, then cuML/cuGraph.

@bdice bdice force-pushed the rmm-cccl-migration branch from 725c510 to ad74660 Compare April 17, 2026 21:07
Copy link
Copy Markdown
Member

@divyegala divyegala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with required fixes before merge.

Comment thread cpp/include/raft/core/device_resources.hpp
Comment on lines -54 to -61
device_resources(const device_resources& handle,
std::shared_ptr<rmm::mr::device_memory_resource> workspace_resource,
std::optional<std::size_t> allocation_limit = std::nullopt)
: resources{handle}
{
// replace the resource factory for the workspace_resources
resource::set_workspace_resource(*this, workspace_resource, allocation_limit);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-add constructor.

@@ -50,31 +45,6 @@ TEST(DeviceResourcesManager, ObeysSetters)

// Provide lock for counting unique objects
auto mtx = std::mutex{};
auto workspace_mrs =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-add test

Comment thread cpp/include/raft/core/handle.hpp
}
return result;
}()},
workspace_mr_{[&params, this]() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-add

Comment on lines +158 to +160
// TODO: reinstate the dynamic_cast<cuda_memory_resource*> guard that
// skipped pool creation when a non-default resource was already set,
// once CCCL exposes resource_cast or an equivalent type-query.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have marked this as a post-task for me to complete. There is a similar need in cuOpt. CCCL recently added the resource_cast feature to help with this, but I haven't had a chance to update RAPIDS to that version of CCCL yet. It should be possible to fix this in 1-2 weeks.

Comment on lines -106 to -107
auto* mr = dynamic_cast<rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource>*>(
rmm::mr::get_current_device_resource());
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on the CCCL resource_cast feature. We can re-add this test later, as noted above.

Restore workspace_resource constructor parameters and copy constructors
on device_resources and handle_t using raft::mr::device_resource instead
of shared_ptr<device_memory_resource>. Re-add workspace MR plumbing in
device_resources_manager (storage, setter, getter, pass-through). Add
TODO for reinstating the dynamic_cast guard once CCCL exposes
resource_cast.
@bdice bdice force-pushed the rmm-cccl-migration branch from 4bdd570 to f05df96 Compare April 21, 2026 11:57
@bdice bdice removed the request for review from a team April 21, 2026 13:43
@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Apr 21, 2026

/merge

@rapids-bot rapids-bot Bot merged commit 1a5f667 into rapidsai:main Apr 21, 2026
157 of 175 checks passed
rgsl888prabhu pushed a commit to NVIDIA/cuopt that referenced this pull request Apr 21, 2026
## Summary

- Remove dependency on `rmm::mr::device_memory_resource` base class;
resources now satisfy the `cuda::mr::resource` concept directly
- Replace `shared_ptr<device_memory_resource>` with value types and
`cuda::mr::any_resource<cuda::mr::device_accessible>` for type-erased
storage
- Replace `set_current_device_resource(ptr)` /
`set_per_device_resource(id, ptr)` with
`set_current_device_resource_ref` / `set_per_device_resource_ref`
- Remove `make_owning_wrapper` usage and `dynamic_cast` on memory
resources (no common base class)
- Add missing `thrust/iterator/transform_output_iterator.h` include (no
longer transitively included via CCCL)

Depends on rapidsai/rmm#2361.
Depends on rapidsai/ucxx#636.
Depends on rapidsai/raft#2996.
gforsyth pushed a commit to rapidsai/cuvs that referenced this pull request Apr 21, 2026
## Summary
- Migrate all RMM usage to the new CCCL memory resource design
(de-templated resources, `device_async_resource_ref` instead of
`device_memory_resource*`, value semantics)
- Replace `get_workspace_resource()` / `get_large_workspace_resource()`
with `_ref()` variants across 65 call sites
- Rewrite `cuda_huge_page_resource` to satisfy CCCL `resource` concept
directly
- Remove `owning_wrapper` / `dynamic_cast` patterns in C API and
benchmarks

Depends on rapidsai/rmm#2361.
Depends on rapidsai/ucxx#636.
Depends on rapidsai/raft#2996.

## Changes
- **33 files changed** (~208 insertions, ~221 deletions)
- `device_memory_resource*` params → `device_async_resource_ref`
(ivf_common, ivf_pq, naive_knn)
- `get_current_device_resource()` → `get_current_device_resource_ref()`
- `set_current_device_resource()` → `set_current_device_resource_ref()`
- De-templated `pool_memory_resource`,
`failure_callback_resource_adaptor` in bench utils
- Removed `&resource` pointer patterns (resources are now copyable value
types)
- Removed spurious `mr` arg from `select_k` calls (previously compiled
due to implicit pointer→bool conversion)
- C API pool resource management rewritten without `owning_wrapper`

---------

Co-authored-by: gpuCI <38199262+GPUtester@users.noreply.github.com>
rapids-bot Bot pushed a commit to rapidsai/cuml that referenced this pull request Apr 21, 2026
## Summary
- Migrate all raw RMM `allocate`/`deallocate` calls to the new CCCL 3-argument API that requires explicit alignment
- Replace removed `rmm.librmm.per_device_resource` Cython import with `rmm.pylibrmm.memory_resource` and use `make_any_device_resource` to obtain the resource for `device_buffer` construction

Depends on rapidsai/rmm#2361.
Depends on rapidsai/ucxx#636.
Depends on rapidsai/raft#2996.
Depends on rapidsai/cuvs#1990.

## Changes
- **`cpp/src/genetic/genetic.cu`**: Add explicit `alignof(node)` / `alignof(program)` to all `allocate` and `deallocate` calls in `parallel_evolve` and `symFit`; fix deallocation bug in `parallel_evolve` where `h_nextprogs[i].len` was incorrectly used instead of `tmp.len` to compute the buffer size being freed
- **`cpp/examples/symreg/symreg_example.cpp`**: Use `params.population_size * sizeof(cg::program)` and `alignof(cg::program)` for `allocate`/`deallocate` calls, fixing incorrect byte-size computation; remove unused `<rmm/aligned.hpp>` include
- **`cpp/tests/sg/genetic/evolution_test.cu`**: Add alignment arguments to allocate/deallocate in `SymReg` test
- **`cpp/tests/sg/genetic/program_test.cu`**: Add alignment arguments to `SetUp`/`TearDown` allocate/deallocate calls
- **`python/cuml/cuml/manifold/umap/umap.pyx`**: Replace `get_current_device_resource()` with `make_any_device_resource(get_current_device_resource().get_mr())` for `device_buffer` construction

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Simon Adorf (https://github.com/csadorf)
  - Divye Gala (https://github.com/divyegala)
  - Victor Lafargue (https://github.com/viclafargue)

URL: #7951
rapids-bot Bot pushed a commit to rapidsai/cugraph that referenced this pull request Apr 23, 2026
## Summary

- Replace removed `rmm::mr::device_memory_resource` base class, `owning_wrapper`, `shared_ptr`-based resource management, and deprecated per-device resource APIs with CCCL-native memory resource types
- Use `cuda::mr::any_resource<cuda::mr::device_accessible>` for owning type-erased storage, `rmm::device_async_resource_ref` for non-owning references, and value-typed resources (`cuda_memory_resource`, `pinned_host_memory_resource`)
- Pass the memory resource to `raft::handle_t` as the `workspace_resource` (3rd) constructor argument, matching the new raft API (`stream_view`, `stream_pool`, `std::optional<raft::mr::device_resource>`)

Depends on rapidsai/rmm#2361.
Depends on rapidsai/ucxx#636.
Depends on rapidsai/raft#2996.
Depends on rapidsai/cuvs#1990.

## Files changed

**Headers:**
- `algorithms.hpp`, `dendrogram.hpp`, `legacy/graph.hpp`, `legacy/functions.hpp`: `get_current_device_resource()` → `get_current_device_resource_ref()` in default argument expressions
- `host_staging_buffer_manager.hpp`: Remove `owning_wrapper`, store `pool_memory_resource` by value in a `std::optional`, accept `pinned_host_memory_resource` by value in `init()`
- `large_buffer_manager.hpp`: Store `pinned_host_memory_resource` by value (not `shared_ptr`), return `device_async_resource_ref` from `get()`, `std::move` the resource into storage
- `mtmg/resource_manager.hpp`: Use `cuda::mr::any_resource<device_accessible>` instead of `shared_ptr<device_memory_resource>` for `per_device_rmm_resources_`, use non-deprecated `set_per_device_resource`, pass resource as `workspace_resource` to `raft::handle_t`

**Tests:**
- `base_fixture.hpp`: Return `any_resource<device_accessible>` from `create_memory_resource()`, use value-typed MR factory helpers (`make_cuda`, `make_managed`, `make_pool`, `make_binning`), switch to non-deprecated `set_current_device_resource` / `get_current_device_resource_ref`
- `multi_node_threaded_test.cpp`: Switch to non-deprecated `set_current_device_resource(resource)`
- `mg_graph500_bfs_test.cu`, `mg_graph500_sssp_test.cu`: Store `pinned_mr_` as `optional<pinned_host_memory_resource>` by value, prefer `.value()` over `operator*` for optional access

**Examples:**
- All 4 example files (`sg_graph_algorithms.cpp`, `mg_graph_algorithms.cpp`, `vertex_and_edge_partition.cu`, `graph_operations.cu`): Use value-typed `cuda_memory_resource`, non-deprecated `set_current_device_resource`, pass the resource to `raft::handle_t` as the `workspace_resource` (3rd positional arg, with `nullptr` for the unused `stream_pool`)

Authors:
  - Bradley Dice (https://github.com/bdice)
  - Chuck Hastings (https://github.com/ChuckHastings)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #5483
rapids-bot Bot pushed a commit that referenced this pull request May 20, 2026
PR #2996 migrated RAFT to the CCCL/RMM memory-resource model and temporarily removed the guard that prevented `device_resources_manager` from overwriting a user-installed RMM resource. Now that CCCL provides `cuda::mr::resource_cast` support for RMM resources, restore that behavior using `rmm::mr::get_current_device_resource_ref()`.

With this change, `device_resources_manager` only creates and installs its pool when the current resource is the default `rmm::mr::cuda_memory_resource`. If another resource is already installed, RAFT preserves it and emits the existing warning.

The test coverage removed during #2996 is restored with `resource_cast` against the current resource ref.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #3020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change improvement Improvement / enhancement to an existing function

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants