Conversation
Convert logging_resource_adaptor from a templated, header-only class to a non-templated class using cuda::mr::shared_resource for reference-counted ownership. Implementation is now compiled into librmm.so. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Lawrence Mitchell (https://github.com/wence-) - David Wendt (https://github.com/davidwendt) URL: #2246
…ared CCCL MR design (#2264) ## Summary Converts `fixed_size_memory_resource` and `binning_memory_resource` from header-only class templates to non-template classes backed by a `detail::*_impl` held via `cuda::mr::shared_resource`. Follows the pattern established by `logging_resource_adaptor` (#2246) and `pool_memory_resource` (#2258). Part of #2011. ## Changes **New files (6)** - `cpp/include/rmm/mr/detail/fixed_size_memory_resource_impl.hpp` — impl class declaration; inherits `stream_ordered_memory_resource<impl, fixed_size_free_list>` (same CRTP pattern as pool) - `cpp/src/mr/detail/fixed_size_memory_resource_impl.cpp` — impl member definitions - `cpp/src/mr/fixed_size_memory_resource.cpp` — outer class constructor and delegating methods - `cpp/include/rmm/mr/detail/binning_memory_resource_impl.hpp` — impl class declaration; includes `fixed_size_memory_resource.hpp` for `unique_ptr` member - `cpp/src/mr/detail/binning_memory_resource_impl.cpp` — impl member definitions - `cpp/src/mr/binning_memory_resource.cpp` — outer class constructor and delegating methods **Modified files (11)** - `cpp/include/rmm/mr/fixed_size_memory_resource.hpp` — de-templated; `shared_resource` inheritance; `device_async_resource_ref` constructor only; `static_assert` for concept - `cpp/include/rmm/mr/binning_memory_resource.hpp` — de-templated; same pattern; `Upstream*` constructors removed - `cpp/CMakeLists.txt` — four new `.cpp` files added to library sources - `python/rmm/rmm/librmm/memory_resource.pxd` — template parameters removed from both declarations - `python/rmm/rmm/pylibrmm/memory_resource/_memory_resource.pyx` — template instantiation syntax removed from `new` expressions - `cpp/tests/mr/mr_ref_fixed_size_tests.cpp` — replaced string-factory suite with typed `FixedSizeMRFixture` + `CcclMrRefTest` - `cpp/tests/mr/mr_ref_binning_tests.cpp` — replaced string-factory suites with typed `BinningMRFixture` + all three `CcclMrRef*` suites - `cpp/tests/mr/mr_ref_test.hpp` — `make_fixed_size`/`make_binning` factory helpers rewritten without `owning_wrapper`; type aliases de-templated - `cpp/tests/mr/binning_mr_tests.cpp` — removed explicit template instantiation and `ThrowOnNullUpstream` (null pointer constructor no longer exists); updated `ExplicitBinMR` to use `device_async_resource_ref` - `cpp/tests/mr/cccl_adaptor_tests.cpp` — added `fixed_size_memory_resource` and `binning_memory_resource` to the shared-ownership typed test suite - `cpp/tests/mr/thrust_allocator_tests.cu` — removed `"Binning"` from the string-dispatch parameterization (coverage moved to `BINNING_MR_REF_*` typed suites; the old dispatch path caused a dangling ref crash — the exact bug this PR fixes) ## Breaking changes - `fixed_size_memory_resource<Upstream>` → `fixed_size_memory_resource` (template parameter removed) - `binning_memory_resource<Upstream>` → `binning_memory_resource` (template parameter removed) - `Upstream*` constructor overloads removed; use `device_async_resource_ref` - Both classes become copyable with shared ownership semantics ## Testing `build-rmm-cpp -j0 && test-rmm-cpp`: 89/89 tests pass.
…d CCCL MR design (#2265) ## Summary Converts `tracking_resource_adaptor`, `statistics_resource_adaptor`, and `aligned_resource_adaptor` from header-only class templates to non-template classes backed by a `detail::*_impl` held via `cuda::mr::shared_resource`, following the pattern established by `logging_resource_adaptor` (#2246). Part of #2011. ## Changes ### New files - `cpp/include/rmm/mr/detail/tracking_resource_adaptor_impl.hpp` - `cpp/include/rmm/mr/detail/statistics_resource_adaptor_impl.hpp` - `cpp/include/rmm/mr/detail/aligned_resource_adaptor_impl.hpp` - `cpp/src/mr/detail/tracking_resource_adaptor_impl.cpp` - `cpp/src/mr/detail/statistics_resource_adaptor_impl.cpp` - `cpp/src/mr/detail/aligned_resource_adaptor_impl.cpp` - `cpp/src/mr/tracking_resource_adaptor.cpp` - `cpp/src/mr/statistics_resource_adaptor.cpp` - `cpp/src/mr/aligned_resource_adaptor.cpp` - `cpp/tests/mr/mr_ref_tracking_tests.cpp` — `CcclMrRefTest` / `CcclMrRefAllocationTest` / `CcclMrRefTestMT` instantiations - `cpp/tests/mr/mr_ref_statistics_tests.cpp` — same - `cpp/tests/mr/mr_ref_aligned_tests.cpp` — `CcclMrRefTest` / `CcclMrRefAllocationTest` ### Modified files - Public headers de-templated, private `shared_resource` inheritance, `get_property` friend, `static_assert` concept check - `cpp/CMakeLists.txt` — new `.cpp` sources added - `cpp/tests/CMakeLists.txt` — `TRACKING_MR_REF_TEST`, `STATISTICS_MR_REF_TEST`, `ALIGNED_MR_REF_TEST` targets - `cpp/tests/mr/adaptor_tests.cpp` — removed template/pointer-based aligned tests, updated `owning_wrapper` to use `limiting_resource_adaptor` - `cpp/tests/mr/tracking_mr_tests.cpp`, `statistics_mr_tests.cpp`, `aligned_mr_tests.cpp` — template aliases removed, null/pointer constructions replaced, stacked-adaptor tests use `device_async_resource_ref{mr}` to avoid copy-construction - `cpp/tests/mr/cccl_adaptor_tests.cpp` — all three new adaptors added to the typed shared-ownership suite - `python/rmm/rmm/librmm/memory_resource.pxd` — template parameters removed from `statistics_resource_adaptor` and `tracking_resource_adaptor` - `python/rmm/rmm/pylibrmm/memory_resource/_memory_resource.pyx` — template instantiation syntax removed ## Checklist - [x] Create `detail/*_impl.hpp` (class declaration) - [x] Create `src/mr/detail/*_impl.cpp` (member definitions) - [x] Create `src/mr/*_adaptor.cpp` (outer class definitions) - [x] Modify public headers (de-template, private inheritance, `get_property`, `static_assert`) - [x] Update `CMakeLists.txt` - [x] Update Cython `.pxd` and `.pyx` - [x] Update tests (remove template instantiation, add non-template fixtures) - [x] `pre-commit run --all-files` - [x] `build-rmm-cpp -j0 && test-rmm-cpp`
## Description The set/reset functions returned `device_async_resource_ref` (non-owning) to the previous resource, but the underlying `any_resource` in the map was immediately overwritten, leaving the returned ref dangling. This was UB that happened to be masked by small buffer optimization for small resource types like `cuda_memory_resource`. This PR returns `cuda::mr::any_resource<cuda::mr::device_accessible>` (owning) instead, using `std::exchange` to atomically swap old and new values. ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.
## Summary - De-template `arena_memory_resource` by removing the `Upstream` template parameter - Split implementation into `detail::arena_memory_resource_impl` held via `cuda::mr::shared_resource` for reference-counted, copyable ownership - Retain the `device_memory_resource` legacy compatibility layer (`do_allocate`, `do_deallocate`, `do_is_equal`) - Update benchmarks, C++ tests, and Python (Cython) bindings to use the non-template `arena_memory_resource` This follows the same pattern established in #2246, #2258, #2264, and #2265 for the other memory resources. ### New files | File | Contents | |------|----------| | `cpp/include/rmm/mr/detail/arena_memory_resource_impl.hpp` | `detail::arena_memory_resource_impl` class declaration | | `cpp/src/mr/detail/arena_memory_resource_impl.cpp` | Impl member function definitions | | `cpp/src/mr/arena_memory_resource.cpp` | Outer class constructor + delegating method definitions | ### Modified files | File | Change | |------|--------| | `cpp/include/rmm/mr/arena_memory_resource.hpp` | De-template, `shared_resource` wrapping | | `cpp/CMakeLists.txt` | Add new `.cpp` source files | | `cpp/tests/mr/arena_mr_tests.cpp` | `arena_memory_resource<device_memory_resource>` → `arena_memory_resource` | | `cpp/tests/mr/mr_ref_arena_tests.cpp` | Add `ArenaMRFixture` + `CcclMrRefTest`/`CcclMrRefAllocationTest`/`CcclMrRefTestMT` instantiations | | `cpp/tests/mr/mr_ref_test.hpp` | Update `make_arena()` and `arena_mr` type alias | | `cpp/tests/mr/cccl_adaptor_tests.cpp` | Add arena `static_assert` + `ArenaMRAdaptorTest` | | `cpp/benchmarks/multi_stream_allocations/multi_stream_allocations_bench.cu` | Update `make_arena()` | | `cpp/benchmarks/random_allocations/random_allocations.cpp` | Update `make_arena()` | | `cpp/benchmarks/replay/replay.cpp` | Update `make_arena()` | | `python/rmm/rmm/librmm/memory_resource.pxd` | Remove `[Upstream]` template from `arena_memory_resource` | | `python/rmm/rmm/pylibrmm/memory_resource/_memory_resource.pyx` | Remove template instantiation syntax for arena |
## Summary - Split `callback_memory_resource` implementation into `detail/callback_memory_resource_impl.hpp` + `src/mr/detail/callback_memory_resource_impl.cpp`, using `cuda::mr::shared_resource` for shared ownership - Accept `device_async_resource_ref` upstream; class is now non-template - Add `mr_ref_callback_tests.cpp` and integrate into `cccl_adaptor_tests.cpp` typed test suite - Update Cython `.pxd`/`.pyx` bindings to match non-template C++ signature
## Summary - Split `prefetch_resource_adaptor` implementation into `detail/prefetch_resource_adaptor_impl.hpp` + `src/mr/detail/prefetch_resource_adaptor_impl.cpp`, using `cuda::mr::shared_resource` for shared ownership - Accept `device_async_resource_ref` upstream; class is now non-template - Add `mr_ref_prefetch_tests.cpp` and integrate into `cccl_adaptor_tests.cpp` typed test suite - Update `adaptor_tests.cpp` type aliases, `NullUpstream`, and `Equality` tests for non-template type - Update Cython `.pxd`/`.pyx` bindings to match non-template C++ signature
## Summary - Split `thread_safe_resource_adaptor` implementation into `detail/thread_safe_resource_adaptor_impl.hpp` + `src/mr/detail/thread_safe_resource_adaptor_impl.cpp`, using `cuda::mr::shared_resource` for shared ownership - Accept `device_async_resource_ref` upstream; class is now non-template - Add `mr_ref_thread_safe_tests.cpp` and integrate into `cccl_adaptor_tests.cpp` typed test suite - Update `adaptor_tests.cpp` type aliases, `NullUpstream`, and `Equality` tests for non-template type
## Description This merges the following changes into the `staging` branch: - Update Cython lower bound pin to 3.2.2 (#2266) - Remove pytest upper bound pin (#2268) - Reduce default pool sizes in Python tests to speed up suite (#2273) ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes. --------- Co-authored-by: Vyas Ramasubramani <vyasr@nvidia.com>
## Summary - Split `limiting_resource_adaptor` implementation into `detail/limiting_resource_adaptor_impl.hpp` + `src/mr/detail/limiting_resource_adaptor_impl.cpp`, using `cuda::mr::shared_resource` for shared ownership - Accept `device_async_resource_ref` upstream; class is now non-template - Add `mr_ref_limiting_tests.cpp` and integrate into `cccl_adaptor_tests.cpp` typed test suite - Update `adaptor_tests.cpp` type aliases, `NullUpstream`, `Equality`, and `owning_wrapper` Equality tests for non-template type with `shared_resource` semantics - Update Cython `.pxd`/`.pyx` bindings to match non-template C++ signature
…2278) ## Summary - Split `failure_callback_resource_adaptor` implementation into `detail/failure_callback_resource_adaptor_impl.hpp` (header-only, since impl is templated on `ExceptionType`) - Accept `device_async_resource_ref` upstream; template parameter changes from `<Upstream, ExceptionType>` to `<ExceptionType>` only - Add `mr_ref_failure_callback_tests.cpp` and integrate into `cccl_adaptor_tests.cpp` typed test suite - Update `adaptor_tests.cpp` type aliases, `NullUpstream`, and `Equality` tests - Update Cython `.pxd`/`.pyx` bindings to use `[ExceptionType]` template parameter with `out_of_memory` forward decl
## Description Closes #2285 Removes `owning_wrapper` and `make_owning_wrapper`, which are no longer necessary after the `cuda::shared_resource` adaptor conversion in #2011. All adaptors now manage upstream lifetime directly via `any_resource`, so the problem `owning_wrapper` solved no longer exists. - Delete `cpp/include/rmm/mr/owning_wrapper.hpp` - Remove stale `#include`s from 4 files (3 benchmarks + `mr_ref_test.hpp`) - Remove `owning_wrapper` from the typed test suite in `adaptor_tests.cpp` ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes. --------- Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>
Merge main into staging
## Description Closes #2287 Migrates all 8 base (non-adaptor) memory resources to natively satisfy the CCCL `cuda::mr::resource` concept, so that concrete types (e.g. `cuda_memory_resource&`) satisfy the concept without virtual dispatch through `device_memory_resource`. **Stateless resources** get `allocate`/`deallocate`/`allocate_sync`/`deallocate_sync` accepting `cuda::stream_ref` directly on the class: - `cuda_memory_resource` — `device_accessible` - `managed_memory_resource` — `device_accessible` + `host_accessible` - `pinned_host_memory_resource` — `device_accessible` + `host_accessible` - `cuda_async_view_memory_resource` — `device_accessible` - `system_memory_resource` — `device_accessible` + `host_accessible` **Stateful, non-copyable resources** use `cuda::mr::shared_resource<Impl>` with `_impl` classes extracted to `detail/` headers and `.cpp` source files, matching the adaptor convention from prior PRs (e.g. `limiting_resource_adaptor`): - `cuda_async_memory_resource` — `device_accessible` - `cuda_async_managed_memory_resource` — `device_accessible` + `host_accessible` - `sam_headroom_memory_resource` — `device_accessible` + `host_accessible` `device_memory_resource` inheritance is kept for backward compatibility. `do_allocate`/`do_deallocate` delegate to the new CCCL methods. Default alignment is `rmm::CUDA_ALLOCATION_ALIGNMENT` (256 bytes). ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes. --------- Co-authored-by: Lawrence Mitchell <wence@gmx.li>
Merge main into staging
…async_resource_ref (#2300) ## Summary Replaces `shared_ptr[device_memory_resource]` with per-subclass `unique_ptr[ConcreteType]` (owning) and `optional[device_async_resource_ref]` (non-owning reference) across all Python/Cython bindings. This is a part of #2011. There are **significant** opportunities to make this Cython code better over time but I have to get something that removes `device_memory_resource` from the Python/Cython side before I can finish migration on the C++ side (#2296). I welcome critique of this design, and ideas for how it can be improved, particularly from @vyasr @wence-. I would like to address any suggested improvements in follow-up PRs, because this changeset is necessary to unblock #2301. The changes in `cdef class DeviceMemoryResource` are perhaps the most significant changes here from a design perspective. The solution I'm going with for now is to keep the `DeviceMemoryResource` class around, as a base class for the Cython MRs, and let it handle allocate/deallocate. It owns a `optional[device_async_resource_ref]` which is used for allocation/deallocation. It's `optional` so that the class can be default-constructed (Cython requires nullary constructors), but it should never be `nullopt` except during initialization. Then, each MR class owns a `c_obj` like `unique_ptr[cuda_memory_resource]`. This is `unique_ptr` so it can be default-constructed for Cython's requirements. I chose `unique_ptr` over `optional` here to emphasize that this member is the thing that actually owns the resource. As with the `c_ref`, this should never be `nullptr` except during initialization. When an MR class is created, it initializes its `c_obj` and then constructs a `c_ref` (a member inherited from the `DeviceMemoryResource` base class). "Special" methods for an MR like getting the statistics counts go through `deref(self.c_obj)`, and "common" methods like allocate/deallocate go through `self.c_ref.value()`. ### Changes - **`.pxd` declarations**: Remove `device_memory_resource` class. Declare `device_async_resource_ref` and a `make_device_async_resource_ref()` inline C++ template that returns `optional` to work around Cython generating default-constructed temporaries for non-default-constructible types. All adaptor constructors take `device_async_resource_ref` instead of `device_memory_resource*`. - **`.pxd` class definitions**: `DeviceMemoryResource` base holds `optional[device_async_resource_ref] c_ref`; each concrete subclass holds `unique_ptr[ConcreteType] c_obj`. - **`.pyx` implementations**: All `__cinit__` methods construct via `unique_ptr` then set `c_ref` via `make_device_async_resource_ref`. Typed accessors (`pool_size`, `flush`, etc.) use `deref(self.c_obj)`. Per-device functions use `set_per_device_resource_ref`. - **`device_buffer.pyx`**: Passes `self.mr.c_ref.value()` instead of `self.mr.get_mr()`. Closes #2294
# Conflicts: # cpp/include/rmm/mr/aligned_resource_adaptor.hpp # cpp/include/rmm/mr/tracking_resource_adaptor.hpp # cpp/tests/mr/aligned_mr_tests.cpp # cpp/tests/mr/statistics_mr_tests.cpp
Merge main into staging
…tors (#2301) ## Summary - Remove `device_memory_resource` inheritance from all memory resources (stateless, stateful, and adaptors) - Remove `do_allocate` / `do_deallocate` / `do_is_equal` virtual overrides from all resources - Rewrite benchmark factory functions from `shared_ptr<device_memory_resource>` to `any_device_resource` - Convert `simulated_memory_resource` from DMR inheritance to CCCL concepts - Change copy/move from `= delete` to `= default` on `cuda_async_memory_resource`, `cuda_async_managed_memory_resource`, `sam_headroom_memory_resource`, and `simulated_memory_resource` (required for CCCL `resource_ref` copyability via `shared_resource` base) - Remove NullUpstream tests and DEVICE_MEMORY_RESOURCE_VIEW_TEST (no longer needed without DMR) Closes #2295 Part of #2011
## Summary - Delete `device_memory_resource.hpp` and `device_memory_resource_view.hpp` - Remove pointer-based `per_device_resource` APIs and bridge helpers - Simplify `cccl_adaptors.hpp` (remove DMR bridge code, retain wrapper for deletion in a follow-up) - Rewrite test mock resources (`mock_resource.hpp`, `device_check_resource_adaptor.hpp`) to use CCCL concepts directly - Update `callback_memory_resource`, aligned, arena, and failure_callback tests Closes #2296 Part of #2011
…, and device_check_resource_adaptor (#2340) ## Description Replace `device_async_resource_ref` members with `cuda::mr::any_resource<device_accessible>` in `polymorphic_allocator`, `thrust_allocator`, and `device_check_resource_adaptor`. This eliminates the CCCL [#8037](NVIDIA/cccl#8037) recursive constraint cycle for these classes. Constructor signatures are unchanged; they still accept `device_async_resource_ref`, which implicitly converts to `any_resource`. ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.
Merge main into staging
## Description Delete `cccl_adaptors.hpp` and replace RMM's wrapper types (`cccl_resource_ref`, `cccl_async_resource_ref`) with direct aliases to CCCL's `resource_ref` and `synchronous_resource_ref`. This eliminates the 480-line adaptor layer that was originally needed to work around the CCCL [#8037](NVIDIA/cccl#8037) recursive constraint satisfaction issue, which has since been fixed upstream in CCCL [#8121](NVIDIA/cccl#8121). Additional changes: - `per_device_resource`: `static_cast<any_device_resource>(ref)` replaced with `any_device_resource{ref}` (wrapper had `operator any_resource`) - Add missing `cuda_stream_view.hpp` include to three impl headers that previously got it transitively through `cccl_adaptors.hpp` ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.
This was referenced Apr 20, 2026
51 tasks
rapids-bot Bot
pushed a commit
to rapidsai/ucxx
that referenced
this pull request
Apr 21, 2026
Migrate to RMM's CCCL-based memory resources. Part of rapidsai/rmm#2011. Depends on rapidsai/rmm#2361. ## Notes The final commit in this PR (`TEMP: Use CI artifacts from RMM PR #2361`) will be reverted before merging. It exists solely to pull CI artifacts from the RMM PR for testing. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #636
rapids-bot Bot
pushed a commit
to rapidsai/raft
that referenced
this pull request
Apr 21, 2026
## Summary - Remove `device_memory_resource` base class usage, de-template all resource and adaptor types, replace pointer-based per-device resource APIs with ref-based equivalents - Part of rapidsai/rmm#2011. Migration guide: rapidsai/rmm#2344. - Supersedes #2917 and #2920 Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. ## Changes ### Core resource infrastructure - **`device_memory_resource.hpp`**: Remove `any_resource_bridge` (which inherited from `rmm::mr::device_memory_resource`), remove all `shared_ptr<device_memory_resource>` constructor overloads, consolidate to `any_resource`-only path - **`device_resources.hpp`**: Remove deprecated constructor taking `shared_ptr<device_memory_resource>`, update `get_workspace_resource()` return type (de-templated `limiting_resource_adaptor`) - **`device_resources_snmg.hpp`**: Remove stale include, de-template `pool_memory_resource` - **`handle.hpp`**: Remove deprecated constructors taking `shared_ptr<device_memory_resource>` - **`device_resources_manager.hpp`**: Retype `workspace_mrs` vector from `shared_ptr<device_memory_resource>` to `raft::mr::device_resource`, update `set_workspace_memory_resource()` signature accordingly, de-template `pool_mr_` to `optional<pool_memory_resource>`, remove `dynamic_cast` for upstream type detection, replace `get/set_current_device_resource()` with `_ref` variants ### Memory tracking - **`memory_tracking_resources.hpp`**: Remove `device_tracking_bridge` (inherited from `device_memory_resource`), use `set_current_device_resource_ref()` directly ### Call sites using `get_workspace_resource()` → `get_workspace_resource_ref()` - `select_k-inl.cuh`, `select_radix.cuh`, `select_warpsort.cuh`, `sparse/select_k-inl.cuh`, `bitmap_to_csr.cuh`, `bitset_to_csr.cuh` ### Benchmarks - **`benchmark.hpp`**: De-template `pool_memory_resource`, use `any_resource` for RAII restore - **`gather.cu`**, **`subsample.cu`**: Same pattern ### Tests - **`handle.cpp`**: Dereference `limiting_resource_adaptor*` for `device_buffer` constructor - **`device_resources_manager.cpp`**: Remove workspace-related test code for removed APIs - **`mdarray.cu`**: Remove `test_device_resource_bridge_unwrap` (bridge no longer exists) - **`multi_variable_gaussian.cu`**: `get_current_device_resource()` → `get_current_device_resource_ref()` Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Divye Gala (https://github.com/divyegala) URL: #2996
gforsyth
added a commit
to rapidsai/cudf
that referenced
this pull request
Apr 21, 2026
## Summary - Replace `device_memory_resource*` with `device_async_resource_ref` across all C++ headers, sources, benchmarks, examples, and tests. - In Cython `.pxd` declarations, change `device_memory_resource *mr` parameters to `device_async_resource_ref mr` (value type). In `.pyx` files, replace `mr.get_mr()` calls with `mr.c_ref.value()`. - Remove `cudf::set_current_device_resource` (pointer-based) wrapper, keeping only the ref-based `set_current_device_resource_ref`. Update return types of `set/reset_current_device_resource_ref` to `cuda::mr::any_resource<cuda::mr::device_accessible>`. - In `host_memory.cpp`, remove `device_memory_resource` inheritance from `pinned_pool_with_fallback_memory_resource`, remove the forward declaration workaround for `rmm::mr::pool_memory_resource` (no longer templated), and wrap non-copyable state in `shared_ptr` to satisfy the `any_resource` copyability requirement. Part of rapidsai/rmm#2011. Depends on rapidsai/rmm#2361. --------- Co-authored-by: Gil Forsyth <gforsyth@users.noreply.github.com>
rgsl888prabhu
pushed a commit
to NVIDIA/cuopt
that referenced
this pull request
Apr 21, 2026
## Summary - Remove dependency on `rmm::mr::device_memory_resource` base class; resources now satisfy the `cuda::mr::resource` concept directly - Replace `shared_ptr<device_memory_resource>` with value types and `cuda::mr::any_resource<cuda::mr::device_accessible>` for type-erased storage - Replace `set_current_device_resource(ptr)` / `set_per_device_resource(id, ptr)` with `set_current_device_resource_ref` / `set_per_device_resource_ref` - Remove `make_owning_wrapper` usage and `dynamic_cast` on memory resources (no common base class) - Add missing `thrust/iterator/transform_output_iterator.h` include (no longer transitively included via CCCL) Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. Depends on rapidsai/raft#2996.
gforsyth
pushed a commit
to rapidsai/cuvs
that referenced
this pull request
Apr 21, 2026
## Summary - Migrate all RMM usage to the new CCCL memory resource design (de-templated resources, `device_async_resource_ref` instead of `device_memory_resource*`, value semantics) - Replace `get_workspace_resource()` / `get_large_workspace_resource()` with `_ref()` variants across 65 call sites - Rewrite `cuda_huge_page_resource` to satisfy CCCL `resource` concept directly - Remove `owning_wrapper` / `dynamic_cast` patterns in C API and benchmarks Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. Depends on rapidsai/raft#2996. ## Changes - **33 files changed** (~208 insertions, ~221 deletions) - `device_memory_resource*` params → `device_async_resource_ref` (ivf_common, ivf_pq, naive_knn) - `get_current_device_resource()` → `get_current_device_resource_ref()` - `set_current_device_resource()` → `set_current_device_resource_ref()` - De-templated `pool_memory_resource`, `failure_callback_resource_adaptor` in bench utils - Removed `&resource` pointer patterns (resources are now copyable value types) - Removed spurious `mr` arg from `select_k` calls (previously compiled due to implicit pointer→bool conversion) - C API pool resource management rewritten without `owning_wrapper` --------- Co-authored-by: gpuCI <38199262+GPUtester@users.noreply.github.com>
rapids-bot Bot
pushed a commit
to rapidsai/rapidsmpf
that referenced
this pull request
Apr 21, 2026
## Summary - Rewrite `RmmResourceAdaptor` as a thin shell inheriting `cuda::mr::shared_resource<detail::RmmResourceAdaptorImpl>`, with all mutable state in the impl class for copyable shared ownership. - Replace `device_memory_resource*` with `rmm::device_async_resource_ref` for non-owning references and `cuda::mr::any_resource` for owning storage. - Remove `rmm::mr::owning_wrapper` usage (removed in RMM 26.06). - Update `pool_memory_resource` usage (no longer a template; upstream passed as `device_async_resource_ref`). - Replace `set_current_device_resource(ptr)` with `set_current_device_resource_ref(ref)`. - Update test resources to satisfy CCCL resource concept (`allocate_sync`, `deallocate_sync`, `operator==`, `get_property`). - Update Cython bindings (`.pxd`/`.pyx`) to use `device_async_resource_ref` instead of `device_memory_resource`. - Point `get_cucascade.cmake` to local cuCascade source directory (companion cuCascade PR: NVIDIA/cuCascade#98). Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. Depends on rapidsai/cudf#22008. ## Notes - Depends on cuCascade migration: NVIDIA/cuCascade#98 - The `get_cucascade.cmake` change to use `SOURCE_DIR` is a development convenience and should be updated to point to the merged cuCascade commit before this PR is finalized. Authors: - Bradley Dice (https://github.com/bdice) - Niranda Perera (https://github.com/nirandaperera) Approvers: - Niranda Perera (https://github.com/nirandaperera) - Peter Andreas Entschev (https://github.com/pentschev) URL: #940
rapids-bot Bot
pushed a commit
to rapidsai/cuml
that referenced
this pull request
Apr 21, 2026
## Summary - Migrate all raw RMM `allocate`/`deallocate` calls to the new CCCL 3-argument API that requires explicit alignment - Replace removed `rmm.librmm.per_device_resource` Cython import with `rmm.pylibrmm.memory_resource` and use `make_any_device_resource` to obtain the resource for `device_buffer` construction Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. Depends on rapidsai/raft#2996. Depends on rapidsai/cuvs#1990. ## Changes - **`cpp/src/genetic/genetic.cu`**: Add explicit `alignof(node)` / `alignof(program)` to all `allocate` and `deallocate` calls in `parallel_evolve` and `symFit`; fix deallocation bug in `parallel_evolve` where `h_nextprogs[i].len` was incorrectly used instead of `tmp.len` to compute the buffer size being freed - **`cpp/examples/symreg/symreg_example.cpp`**: Use `params.population_size * sizeof(cg::program)` and `alignof(cg::program)` for `allocate`/`deallocate` calls, fixing incorrect byte-size computation; remove unused `<rmm/aligned.hpp>` include - **`cpp/tests/sg/genetic/evolution_test.cu`**: Add alignment arguments to allocate/deallocate in `SymReg` test - **`cpp/tests/sg/genetic/program_test.cu`**: Add alignment arguments to `SetUp`/`TearDown` allocate/deallocate calls - **`python/cuml/cuml/manifold/umap/umap.pyx`**: Replace `get_current_device_resource()` with `make_any_device_resource(get_current_device_resource().get_mr())` for `device_buffer` construction Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Simon Adorf (https://github.com/csadorf) - Divye Gala (https://github.com/divyegala) - Victor Lafargue (https://github.com/viclafargue) URL: #7951
rapids-bot Bot
pushed a commit
to rapidsai/cugraph
that referenced
this pull request
Apr 23, 2026
## Summary - Replace removed `rmm::mr::device_memory_resource` base class, `owning_wrapper`, `shared_ptr`-based resource management, and deprecated per-device resource APIs with CCCL-native memory resource types - Use `cuda::mr::any_resource<cuda::mr::device_accessible>` for owning type-erased storage, `rmm::device_async_resource_ref` for non-owning references, and value-typed resources (`cuda_memory_resource`, `pinned_host_memory_resource`) - Pass the memory resource to `raft::handle_t` as the `workspace_resource` (3rd) constructor argument, matching the new raft API (`stream_view`, `stream_pool`, `std::optional<raft::mr::device_resource>`) Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. Depends on rapidsai/raft#2996. Depends on rapidsai/cuvs#1990. ## Files changed **Headers:** - `algorithms.hpp`, `dendrogram.hpp`, `legacy/graph.hpp`, `legacy/functions.hpp`: `get_current_device_resource()` → `get_current_device_resource_ref()` in default argument expressions - `host_staging_buffer_manager.hpp`: Remove `owning_wrapper`, store `pool_memory_resource` by value in a `std::optional`, accept `pinned_host_memory_resource` by value in `init()` - `large_buffer_manager.hpp`: Store `pinned_host_memory_resource` by value (not `shared_ptr`), return `device_async_resource_ref` from `get()`, `std::move` the resource into storage - `mtmg/resource_manager.hpp`: Use `cuda::mr::any_resource<device_accessible>` instead of `shared_ptr<device_memory_resource>` for `per_device_rmm_resources_`, use non-deprecated `set_per_device_resource`, pass resource as `workspace_resource` to `raft::handle_t` **Tests:** - `base_fixture.hpp`: Return `any_resource<device_accessible>` from `create_memory_resource()`, use value-typed MR factory helpers (`make_cuda`, `make_managed`, `make_pool`, `make_binning`), switch to non-deprecated `set_current_device_resource` / `get_current_device_resource_ref` - `multi_node_threaded_test.cpp`: Switch to non-deprecated `set_current_device_resource(resource)` - `mg_graph500_bfs_test.cu`, `mg_graph500_sssp_test.cu`: Store `pinned_mr_` as `optional<pinned_host_memory_resource>` by value, prefer `.value()` over `operator*` for optional access **Examples:** - All 4 example files (`sg_graph_algorithms.cpp`, `mg_graph_algorithms.cpp`, `vertex_and_edge_partition.cu`, `graph_operations.cu`): Use value-typed `cuda_memory_resource`, non-deprecated `set_current_device_resource`, pass the resource to `raft::handle_t` as the `workspace_resource` (3rd positional arg, with `nullptr` for the unused `stream_pool`) Authors: - Bradley Dice (https://github.com/bdice) - Chuck Hastings (https://github.com/ChuckHastings) Approvers: - Chuck Hastings (https://github.com/ChuckHastings) - Vyas Ramasubramani (https://github.com/vyasr) URL: #5483
3 tasks
shrshi
pushed a commit
to shrshi/cudf
that referenced
this pull request
May 12, 2026
## Summary - Replace `device_memory_resource*` with `device_async_resource_ref` across all C++ headers, sources, benchmarks, examples, and tests. - In Cython `.pxd` declarations, change `device_memory_resource *mr` parameters to `device_async_resource_ref mr` (value type). In `.pyx` files, replace `mr.get_mr()` calls with `mr.c_ref.value()`. - Remove `cudf::set_current_device_resource` (pointer-based) wrapper, keeping only the ref-based `set_current_device_resource_ref`. Update return types of `set/reset_current_device_resource_ref` to `cuda::mr::any_resource<cuda::mr::device_accessible>`. - In `host_memory.cpp`, remove `device_memory_resource` inheritance from `pinned_pool_with_fallback_memory_resource`, remove the forward declaration workaround for `rmm::mr::pool_memory_resource` (no longer templated), and wrap non-copyable state in `shared_ptr` to satisfy the `any_resource` copyability requirement. Part of rapidsai/rmm#2011. Depends on rapidsai/rmm#2361. --------- Co-authored-by: Gil Forsyth <gforsyth@users.noreply.github.com>
This was referenced May 16, 2026
This was referenced May 17, 2026
This was referenced May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Merges all breaking changes from the
stagingbranch intomain, completing the CCCL memory resource migration tracked in #2011.Summary of changes
Adaptor refactors to
cuda::shared_resourcedesign:Remove legacy infrastructure:
Post-cleanup:
Breaking changes
device_memory_resourcebase class has been removed. All memory resources now implement the CCCL resource concept directly.owning_wrapperhas been removed. Adaptors are now used withcuda::shared_resource.Upstreamtemplate parameter). They accept any upstream resource viadevice_async_resource_ref.device_async_resource_refinstead ofdevice_memory_resource*.cccl_adaptors.hpphas been deleted; raw CCCLresource_reftypes are used directly.See #2344 for the migration guide and #2345 for downstream consumer documentation.
Downstream library updates
All downstream RAPIDS libraries have draft PRs to adopt these changes (tracked in #2011 under "Update RAPIDS libraries").
Checklist