Merge staging into main: CCCL memory resource migration by bdice · Pull Request #2361 · rapidsai/rmm

bdice · 2026-04-15T00:10:59Z

Description

Merges all breaking changes from the staging branch into main, completing the CCCL memory resource migration tracked in #2011.

Summary of changes

Adaptor refactors to cuda::shared_resource design:

Refactor logging_resource_adaptor (Refactor logging_resource_adaptor to shared CCCL MR design #2246)
Refactor pool_memory_resource (Refactor pool_memory_resource to shared CCCL MR design #2258)
Refactor fixed_size_memory_resource and binning_memory_resource (Refactor fixed_size_memory_resource and binning_memory_resource to shared CCCL MR design #2264)
Refactor tracking, statistics, and aligned resource adaptors (Refactor tracking, statistics, and aligned resource adaptors to shared CCCL MR design #2265)
Return owning any_resource from set_per_device_resource_ref (Return owning any_resource from set_per_device_resource_ref #2271)
Refactor arena_memory_resource (Refactor arena_memory_resource to shared CCCL MR design #2272)
Refactor callback_memory_resource (Refactor callback_memory_resource to shared CCCL MR design #2274)
Refactor prefetch_resource_adaptor (Refactor prefetch_resource_adaptor to shared CCCL MR design #2275)
Refactor thread_safe_resource_adaptor (Refactor thread_safe_resource_adaptor to shared CCCL MR design #2276)
Refactor limiting_resource_adaptor (Refactor limiting_resource_adaptor to shared CCCL MR design #2277)
Refactor failure_callback_resource_adaptor (Refactor failure_callback_resource_adaptor to shared CCCL MR design #2278)

Remove legacy infrastructure:

Remove owning_wrapper (Remove owning_wrapper #2286)
Migrate base memory resources to native CCCL resource concept (Migrate base memory resources to native CCCL resource concept #2289)
Migrate Python/Cython bindings from device_memory_resource to device_async_resource_ref (Migrate Python/Cython bindings from device_memory_resource to device_async_resource_ref #2300)
Remove device_memory_resource inheritance from all resources and adaptors (Remove device_memory_resource inheritance from all resources and adaptors #2301)
Remove bridge infrastructure and device_memory_resource (Remove bridge infrastructure and device_memory_resource #2324)
Delete cccl_adaptors.hpp and use raw CCCL resource_ref types (Delete cccl_adaptors.hpp and use raw CCCL resource_ref types #2325)
Store any_resource members in polymorphic_allocator, thrust_allocator, and device_check_resource_adaptor (Store any_resource members in polymorphic_allocator, thrust_allocator, and device_check_resource_adaptor #2340)

Post-cleanup:

Disable over-alignment tests that fail after device_memory_resource removal (Leaf MRs do not enforce alignment limits after device_memory_resource removal #2342)
Fix deprecations of CCCL allocations without specified alignment (Fix deprecations of CCCL allocations without specified alignment #2351)
Use any_resource<device_accessible> for upstream constructor parameters (Use any_resource<device_accessible> for upstream constructor parameters #2354)
Add set_per_device_resource and set_current_device_resource taking any_resource by value (Add set_per_device_resource and set_current_device_resource taking any_resource by value #2356)

Breaking changes

device_memory_resource base class has been removed. All memory resources now implement the CCCL resource concept directly.
owning_wrapper has been removed. Adaptors are now used with cuda::shared_resource.
All resource adaptors are de-templated (no more Upstream template parameter). They accept any upstream resource via device_async_resource_ref.
Python/Cython bindings now use device_async_resource_ref instead of device_memory_resource*.
cccl_adaptors.hpp has been deleted; raw CCCL resource_ref types are used directly.

See #2344 for the migration guide and #2345 for downstream consumer documentation.

Downstream library updates

All downstream RAPIDS libraries have draft PRs to adopt these changes (tracked in #2011 under "Update RAPIDS libraries").

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Convert logging_resource_adaptor from a templated, header-only class to a non-templated class using cuda::mr::shared_resource for reference-counted ownership. Implementation is now compiled into librmm.so. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Lawrence Mitchell (https://github.com/wence-) - David Wendt (https://github.com/davidwendt) URL: #2246

Part of #2011.

…ared CCCL MR design (#2264) ## Summary Converts `fixed_size_memory_resource` and `binning_memory_resource` from header-only class templates to non-template classes backed by a `detail::*_impl` held via `cuda::mr::shared_resource`. Follows the pattern established by `logging_resource_adaptor` (#2246) and `pool_memory_resource` (#2258). Part of #2011. ## Changes **New files (6)** - `cpp/include/rmm/mr/detail/fixed_size_memory_resource_impl.hpp` — impl class declaration; inherits `stream_ordered_memory_resource<impl, fixed_size_free_list>` (same CRTP pattern as pool) - `cpp/src/mr/detail/fixed_size_memory_resource_impl.cpp` — impl member definitions - `cpp/src/mr/fixed_size_memory_resource.cpp` — outer class constructor and delegating methods - `cpp/include/rmm/mr/detail/binning_memory_resource_impl.hpp` — impl class declaration; includes `fixed_size_memory_resource.hpp` for `unique_ptr` member - `cpp/src/mr/detail/binning_memory_resource_impl.cpp` — impl member definitions - `cpp/src/mr/binning_memory_resource.cpp` — outer class constructor and delegating methods **Modified files (11)** - `cpp/include/rmm/mr/fixed_size_memory_resource.hpp` — de-templated; `shared_resource` inheritance; `device_async_resource_ref` constructor only; `static_assert` for concept - `cpp/include/rmm/mr/binning_memory_resource.hpp` — de-templated; same pattern; `Upstream*` constructors removed - `cpp/CMakeLists.txt` — four new `.cpp` files added to library sources - `python/rmm/rmm/librmm/memory_resource.pxd` — template parameters removed from both declarations - `python/rmm/rmm/pylibrmm/memory_resource/_memory_resource.pyx` — template instantiation syntax removed from `new` expressions - `cpp/tests/mr/mr_ref_fixed_size_tests.cpp` — replaced string-factory suite with typed `FixedSizeMRFixture` + `CcclMrRefTest` - `cpp/tests/mr/mr_ref_binning_tests.cpp` — replaced string-factory suites with typed `BinningMRFixture` + all three `CcclMrRef*` suites - `cpp/tests/mr/mr_ref_test.hpp` — `make_fixed_size`/`make_binning` factory helpers rewritten without `owning_wrapper`; type aliases de-templated - `cpp/tests/mr/binning_mr_tests.cpp` — removed explicit template instantiation and `ThrowOnNullUpstream` (null pointer constructor no longer exists); updated `ExplicitBinMR` to use `device_async_resource_ref` - `cpp/tests/mr/cccl_adaptor_tests.cpp` — added `fixed_size_memory_resource` and `binning_memory_resource` to the shared-ownership typed test suite - `cpp/tests/mr/thrust_allocator_tests.cu` — removed `"Binning"` from the string-dispatch parameterization (coverage moved to `BINNING_MR_REF_*` typed suites; the old dispatch path caused a dangling ref crash — the exact bug this PR fixes) ## Breaking changes - `fixed_size_memory_resource<Upstream>` → `fixed_size_memory_resource` (template parameter removed) - `binning_memory_resource<Upstream>` → `binning_memory_resource` (template parameter removed) - `Upstream*` constructor overloads removed; use `device_async_resource_ref` - Both classes become copyable with shared ownership semantics ## Testing `build-rmm-cpp -j0 && test-rmm-cpp`: 89/89 tests pass.

…d CCCL MR design (#2265) ## Summary Converts `tracking_resource_adaptor`, `statistics_resource_adaptor`, and `aligned_resource_adaptor` from header-only class templates to non-template classes backed by a `detail::*_impl` held via `cuda::mr::shared_resource`, following the pattern established by `logging_resource_adaptor` (#2246). Part of #2011. ## Changes ### New files - `cpp/include/rmm/mr/detail/tracking_resource_adaptor_impl.hpp` - `cpp/include/rmm/mr/detail/statistics_resource_adaptor_impl.hpp` - `cpp/include/rmm/mr/detail/aligned_resource_adaptor_impl.hpp` - `cpp/src/mr/detail/tracking_resource_adaptor_impl.cpp` - `cpp/src/mr/detail/statistics_resource_adaptor_impl.cpp` - `cpp/src/mr/detail/aligned_resource_adaptor_impl.cpp` - `cpp/src/mr/tracking_resource_adaptor.cpp` - `cpp/src/mr/statistics_resource_adaptor.cpp` - `cpp/src/mr/aligned_resource_adaptor.cpp` - `cpp/tests/mr/mr_ref_tracking_tests.cpp` — `CcclMrRefTest` / `CcclMrRefAllocationTest` / `CcclMrRefTestMT` instantiations - `cpp/tests/mr/mr_ref_statistics_tests.cpp` — same - `cpp/tests/mr/mr_ref_aligned_tests.cpp` — `CcclMrRefTest` / `CcclMrRefAllocationTest` ### Modified files - Public headers de-templated, private `shared_resource` inheritance, `get_property` friend, `static_assert` concept check - `cpp/CMakeLists.txt` — new `.cpp` sources added - `cpp/tests/CMakeLists.txt` — `TRACKING_MR_REF_TEST`, `STATISTICS_MR_REF_TEST`, `ALIGNED_MR_REF_TEST` targets - `cpp/tests/mr/adaptor_tests.cpp` — removed template/pointer-based aligned tests, updated `owning_wrapper` to use `limiting_resource_adaptor` - `cpp/tests/mr/tracking_mr_tests.cpp`, `statistics_mr_tests.cpp`, `aligned_mr_tests.cpp` — template aliases removed, null/pointer constructions replaced, stacked-adaptor tests use `device_async_resource_ref{mr}` to avoid copy-construction - `cpp/tests/mr/cccl_adaptor_tests.cpp` — all three new adaptors added to the typed shared-ownership suite - `python/rmm/rmm/librmm/memory_resource.pxd` — template parameters removed from `statistics_resource_adaptor` and `tracking_resource_adaptor` - `python/rmm/rmm/pylibrmm/memory_resource/_memory_resource.pyx` — template instantiation syntax removed ## Checklist - [x] Create `detail/*_impl.hpp` (class declaration) - [x] Create `src/mr/detail/*_impl.cpp` (member definitions) - [x] Create `src/mr/*_adaptor.cpp` (outer class definitions) - [x] Modify public headers (de-template, private inheritance, `get_property`, `static_assert`) - [x] Update `CMakeLists.txt` - [x] Update Cython `.pxd` and `.pyx` - [x] Update tests (remove template instantiation, add non-template fixtures) - [x] `pre-commit run --all-files` - [x] `build-rmm-cpp -j0 && test-rmm-cpp`

## Description The set/reset functions returned `device_async_resource_ref` (non-owning) to the previous resource, but the underlying `any_resource` in the map was immediately overwritten, leaving the returned ref dangling. This was UB that happened to be masked by small buffer optimization for small resource types like `cuda_memory_resource`. This PR returns `cuda::mr::any_resource<cuda::mr::device_accessible>` (owning) instead, using `std::exchange` to atomically swap old and new values. ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.

## Summary - De-template `arena_memory_resource` by removing the `Upstream` template parameter - Split implementation into `detail::arena_memory_resource_impl` held via `cuda::mr::shared_resource` for reference-counted, copyable ownership - Retain the `device_memory_resource` legacy compatibility layer (`do_allocate`, `do_deallocate`, `do_is_equal`) - Update benchmarks, C++ tests, and Python (Cython) bindings to use the non-template `arena_memory_resource` This follows the same pattern established in #2246, #2258, #2264, and #2265 for the other memory resources. ### New files | File | Contents | |------|----------| | `cpp/include/rmm/mr/detail/arena_memory_resource_impl.hpp` | `detail::arena_memory_resource_impl` class declaration | | `cpp/src/mr/detail/arena_memory_resource_impl.cpp` | Impl member function definitions | | `cpp/src/mr/arena_memory_resource.cpp` | Outer class constructor + delegating method definitions | ### Modified files | File | Change | |------|--------| | `cpp/include/rmm/mr/arena_memory_resource.hpp` | De-template, `shared_resource` wrapping | | `cpp/CMakeLists.txt` | Add new `.cpp` source files | | `cpp/tests/mr/arena_mr_tests.cpp` | `arena_memory_resource<device_memory_resource>` → `arena_memory_resource` | | `cpp/tests/mr/mr_ref_arena_tests.cpp` | Add `ArenaMRFixture` + `CcclMrRefTest`/`CcclMrRefAllocationTest`/`CcclMrRefTestMT` instantiations | | `cpp/tests/mr/mr_ref_test.hpp` | Update `make_arena()` and `arena_mr` type alias | | `cpp/tests/mr/cccl_adaptor_tests.cpp` | Add arena `static_assert` + `ArenaMRAdaptorTest` | | `cpp/benchmarks/multi_stream_allocations/multi_stream_allocations_bench.cu` | Update `make_arena()` | | `cpp/benchmarks/random_allocations/random_allocations.cpp` | Update `make_arena()` | | `cpp/benchmarks/replay/replay.cpp` | Update `make_arena()` | | `python/rmm/rmm/librmm/memory_resource.pxd` | Remove `[Upstream]` template from `arena_memory_resource` | | `python/rmm/rmm/pylibrmm/memory_resource/_memory_resource.pyx` | Remove template instantiation syntax for arena |

## Summary - Split `callback_memory_resource` implementation into `detail/callback_memory_resource_impl.hpp` + `src/mr/detail/callback_memory_resource_impl.cpp`, using `cuda::mr::shared_resource` for shared ownership - Accept `device_async_resource_ref` upstream; class is now non-template - Add `mr_ref_callback_tests.cpp` and integrate into `cccl_adaptor_tests.cpp` typed test suite - Update Cython `.pxd`/`.pyx` bindings to match non-template C++ signature

## Summary - Split `prefetch_resource_adaptor` implementation into `detail/prefetch_resource_adaptor_impl.hpp` + `src/mr/detail/prefetch_resource_adaptor_impl.cpp`, using `cuda::mr::shared_resource` for shared ownership - Accept `device_async_resource_ref` upstream; class is now non-template - Add `mr_ref_prefetch_tests.cpp` and integrate into `cccl_adaptor_tests.cpp` typed test suite - Update `adaptor_tests.cpp` type aliases, `NullUpstream`, and `Equality` tests for non-template type - Update Cython `.pxd`/`.pyx` bindings to match non-template C++ signature

## Summary - Split `thread_safe_resource_adaptor` implementation into `detail/thread_safe_resource_adaptor_impl.hpp` + `src/mr/detail/thread_safe_resource_adaptor_impl.cpp`, using `cuda::mr::shared_resource` for shared ownership - Accept `device_async_resource_ref` upstream; class is now non-template - Add `mr_ref_thread_safe_tests.cpp` and integrate into `cccl_adaptor_tests.cpp` typed test suite - Update `adaptor_tests.cpp` type aliases, `NullUpstream`, and `Equality` tests for non-template type

## Description This merges the following changes into the `staging` branch: - Update Cython lower bound pin to 3.2.2 (#2266) - Remove pytest upper bound pin (#2268) - Reduce default pool sizes in Python tests to speed up suite (#2273) ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes. --------- Co-authored-by: Vyas Ramasubramani <vyasr@nvidia.com>

## Summary - Split `limiting_resource_adaptor` implementation into `detail/limiting_resource_adaptor_impl.hpp` + `src/mr/detail/limiting_resource_adaptor_impl.cpp`, using `cuda::mr::shared_resource` for shared ownership - Accept `device_async_resource_ref` upstream; class is now non-template - Add `mr_ref_limiting_tests.cpp` and integrate into `cccl_adaptor_tests.cpp` typed test suite - Update `adaptor_tests.cpp` type aliases, `NullUpstream`, `Equality`, and `owning_wrapper` Equality tests for non-template type with `shared_resource` semantics - Update Cython `.pxd`/`.pyx` bindings to match non-template C++ signature

…2278) ## Summary - Split `failure_callback_resource_adaptor` implementation into `detail/failure_callback_resource_adaptor_impl.hpp` (header-only, since impl is templated on `ExceptionType`) - Accept `device_async_resource_ref` upstream; template parameter changes from `<Upstream, ExceptionType>` to `<ExceptionType>` only - Add `mr_ref_failure_callback_tests.cpp` and integrate into `cccl_adaptor_tests.cpp` typed test suite - Update `adaptor_tests.cpp` type aliases, `NullUpstream`, and `Equality` tests - Update Cython `.pxd`/`.pyx` bindings to use `[ExceptionType]` template parameter with `out_of_memory` forward decl

## Description Closes #2285 Removes `owning_wrapper` and `make_owning_wrapper`, which are no longer necessary after the `cuda::shared_resource` adaptor conversion in #2011. All adaptors now manage upstream lifetime directly via `any_resource`, so the problem `owning_wrapper` solved no longer exists. - Delete `cpp/include/rmm/mr/owning_wrapper.hpp` - Remove stale `#include`s from 4 files (3 benchmarks + `mr_ref_test.hpp`) - Remove `owning_wrapper` from the typed test suite in `adaptor_tests.cpp` ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes. --------- Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>

Merge main into staging

## Description Closes #2287 Migrates all 8 base (non-adaptor) memory resources to natively satisfy the CCCL `cuda::mr::resource` concept, so that concrete types (e.g. `cuda_memory_resource&`) satisfy the concept without virtual dispatch through `device_memory_resource`. **Stateless resources** get `allocate`/`deallocate`/`allocate_sync`/`deallocate_sync` accepting `cuda::stream_ref` directly on the class: - `cuda_memory_resource` — `device_accessible` - `managed_memory_resource` — `device_accessible` + `host_accessible` - `pinned_host_memory_resource` — `device_accessible` + `host_accessible` - `cuda_async_view_memory_resource` — `device_accessible` - `system_memory_resource` — `device_accessible` + `host_accessible` **Stateful, non-copyable resources** use `cuda::mr::shared_resource<Impl>` with `_impl` classes extracted to `detail/` headers and `.cpp` source files, matching the adaptor convention from prior PRs (e.g. `limiting_resource_adaptor`): - `cuda_async_memory_resource` — `device_accessible` - `cuda_async_managed_memory_resource` — `device_accessible` + `host_accessible` - `sam_headroom_memory_resource` — `device_accessible` + `host_accessible` `device_memory_resource` inheritance is kept for backward compatibility. `do_allocate`/`do_deallocate` delegate to the new CCCL methods. Default alignment is `rmm::CUDA_ALLOCATION_ALIGNMENT` (256 bytes). ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes. --------- Co-authored-by: Lawrence Mitchell <wence@gmx.li>

Merge main into staging

@vyasr

…async_resource_ref (#2300) ## Summary Replaces `shared_ptr[device_memory_resource]` with per-subclass `unique_ptr[ConcreteType]` (owning) and `optional[device_async_resource_ref]` (non-owning reference) across all Python/Cython bindings. This is a part of #2011. There are **significant** opportunities to make this Cython code better over time but I have to get something that removes `device_memory_resource` from the Python/Cython side before I can finish migration on the C++ side (#2296). I welcome critique of this design, and ideas for how it can be improved, particularly from @vyasr @wence-. I would like to address any suggested improvements in follow-up PRs, because this changeset is necessary to unblock #2301. The changes in `cdef class DeviceMemoryResource` are perhaps the most significant changes here from a design perspective. The solution I'm going with for now is to keep the `DeviceMemoryResource` class around, as a base class for the Cython MRs, and let it handle allocate/deallocate. It owns a `optional[device_async_resource_ref]` which is used for allocation/deallocation. It's `optional` so that the class can be default-constructed (Cython requires nullary constructors), but it should never be `nullopt` except during initialization. Then, each MR class owns a `c_obj` like `unique_ptr[cuda_memory_resource]`. This is `unique_ptr` so it can be default-constructed for Cython's requirements. I chose `unique_ptr` over `optional` here to emphasize that this member is the thing that actually owns the resource. As with the `c_ref`, this should never be `nullptr` except during initialization. When an MR class is created, it initializes its `c_obj` and then constructs a `c_ref` (a member inherited from the `DeviceMemoryResource` base class). "Special" methods for an MR like getting the statistics counts go through `deref(self.c_obj)`, and "common" methods like allocate/deallocate go through `self.c_ref.value()`. ### Changes - **`.pxd` declarations**: Remove `device_memory_resource` class. Declare `device_async_resource_ref` and a `make_device_async_resource_ref()` inline C++ template that returns `optional` to work around Cython generating default-constructed temporaries for non-default-constructible types. All adaptor constructors take `device_async_resource_ref` instead of `device_memory_resource*`. - **`.pxd` class definitions**: `DeviceMemoryResource` base holds `optional[device_async_resource_ref] c_ref`; each concrete subclass holds `unique_ptr[ConcreteType] c_obj`. - **`.pyx` implementations**: All `__cinit__` methods construct via `unique_ptr` then set `c_ref` via `make_device_async_resource_ref`. Typed accessors (`pool_size`, `flush`, etc.) use `deref(self.c_obj)`. Per-device functions use `set_per_device_resource_ref`. - **`device_buffer.pyx`**: Passes `self.mr.c_ref.value()` instead of `self.mr.get_mr()`. Closes #2294

# Conflicts: # cpp/include/rmm/mr/aligned_resource_adaptor.hpp # cpp/include/rmm/mr/tracking_resource_adaptor.hpp # cpp/tests/mr/aligned_mr_tests.cpp # cpp/tests/mr/statistics_mr_tests.cpp

Merge main into staging

…tors (#2301) ## Summary - Remove `device_memory_resource` inheritance from all memory resources (stateless, stateful, and adaptors) - Remove `do_allocate` / `do_deallocate` / `do_is_equal` virtual overrides from all resources - Rewrite benchmark factory functions from `shared_ptr<device_memory_resource>` to `any_device_resource` - Convert `simulated_memory_resource` from DMR inheritance to CCCL concepts - Change copy/move from `= delete` to `= default` on `cuda_async_memory_resource`, `cuda_async_managed_memory_resource`, `sam_headroom_memory_resource`, and `simulated_memory_resource` (required for CCCL `resource_ref` copyability via `shared_resource` base) - Remove NullUpstream tests and DEVICE_MEMORY_RESOURCE_VIEW_TEST (no longer needed without DMR) Closes #2295 Part of #2011

## Summary - Delete `device_memory_resource.hpp` and `device_memory_resource_view.hpp` - Remove pointer-based `per_device_resource` APIs and bridge helpers - Simplify `cccl_adaptors.hpp` (remove DMR bridge code, retain wrapper for deletion in a follow-up) - Rewrite test mock resources (`mock_resource.hpp`, `device_check_resource_adaptor.hpp`) to use CCCL concepts directly - Update `callback_memory_resource`, aligned, arena, and failure_callback tests Closes #2296 Part of #2011

…, and device_check_resource_adaptor (#2340) ## Description Replace `device_async_resource_ref` members with `cuda::mr::any_resource<device_accessible>` in `polymorphic_allocator`, `thrust_allocator`, and `device_check_resource_adaptor`. This eliminates the CCCL [#8037](NVIDIA/cccl#8037) recursive constraint cycle for these classes. Constructor signatures are unchanged; they still accept `device_async_resource_ref`, which implicitly converts to `any_resource`. ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.

…emoval (#2342) Leaf MRs no longer enforce alignment limits after the bridge infrastructure was removed in #2324. Disable the four tests that expect bad_alloc for alignment > CUDA_ALLOCATION_ALIGNMENT until alignment enforcement is restored.

Merge main into staging

## Description Delete `cccl_adaptors.hpp` and replace RMM's wrapper types (`cccl_resource_ref`, `cccl_async_resource_ref`) with direct aliases to CCCL's `resource_ref` and `synchronous_resource_ref`. This eliminates the 480-line adaptor layer that was originally needed to work around the CCCL [#8037](NVIDIA/cccl#8037) recursive constraint satisfaction issue, which has since been fixed upstream in CCCL [#8121](NVIDIA/cccl#8121). Additional changes: - `per_device_resource`: `static_cast<any_device_resource>(ref)` replaced with `any_device_resource{ref}` (wrapper had `operator any_resource`) - Add missing `cuda_stream_view.hpp` include to three impl headers that previously got it transitively through `cccl_adaptors.hpp` ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.

Migrate to RMM's CCCL-based memory resources. Part of rapidsai/rmm#2011. Depends on rapidsai/rmm#2361. ## Notes The final commit in this PR (`TEMP: Use CI artifacts from RMM PR #2361`) will be reverted before merging. It exists solely to pull CI artifacts from the RMM PR for testing. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #636

## Summary - Remove `device_memory_resource` base class usage, de-template all resource and adaptor types, replace pointer-based per-device resource APIs with ref-based equivalents - Part of rapidsai/rmm#2011. Migration guide: rapidsai/rmm#2344. - Supersedes #2917 and #2920 Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. ## Changes ### Core resource infrastructure - **`device_memory_resource.hpp`**: Remove `any_resource_bridge` (which inherited from `rmm::mr::device_memory_resource`), remove all `shared_ptr<device_memory_resource>` constructor overloads, consolidate to `any_resource`-only path - **`device_resources.hpp`**: Remove deprecated constructor taking `shared_ptr<device_memory_resource>`, update `get_workspace_resource()` return type (de-templated `limiting_resource_adaptor`) - **`device_resources_snmg.hpp`**: Remove stale include, de-template `pool_memory_resource` - **`handle.hpp`**: Remove deprecated constructors taking `shared_ptr<device_memory_resource>` - **`device_resources_manager.hpp`**: Retype `workspace_mrs` vector from `shared_ptr<device_memory_resource>` to `raft::mr::device_resource`, update `set_workspace_memory_resource()` signature accordingly, de-template `pool_mr_` to `optional<pool_memory_resource>`, remove `dynamic_cast` for upstream type detection, replace `get/set_current_device_resource()` with `_ref` variants ### Memory tracking - **`memory_tracking_resources.hpp`**: Remove `device_tracking_bridge` (inherited from `device_memory_resource`), use `set_current_device_resource_ref()` directly ### Call sites using `get_workspace_resource()` → `get_workspace_resource_ref()` - `select_k-inl.cuh`, `select_radix.cuh`, `select_warpsort.cuh`, `sparse/select_k-inl.cuh`, `bitmap_to_csr.cuh`, `bitset_to_csr.cuh` ### Benchmarks - **`benchmark.hpp`**: De-template `pool_memory_resource`, use `any_resource` for RAII restore - **`gather.cu`**, **`subsample.cu`**: Same pattern ### Tests - **`handle.cpp`**: Dereference `limiting_resource_adaptor*` for `device_buffer` constructor - **`device_resources_manager.cpp`**: Remove workspace-related test code for removed APIs - **`mdarray.cu`**: Remove `test_device_resource_bridge_unwrap` (bridge no longer exists) - **`multi_variable_gaussian.cu`**: `get_current_device_resource()` → `get_current_device_resource_ref()` Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Divye Gala (https://github.com/divyegala) URL: #2996

## Summary - Replace `device_memory_resource*` with `device_async_resource_ref` across all C++ headers, sources, benchmarks, examples, and tests. - In Cython `.pxd` declarations, change `device_memory_resource *mr` parameters to `device_async_resource_ref mr` (value type). In `.pyx` files, replace `mr.get_mr()` calls with `mr.c_ref.value()`. - Remove `cudf::set_current_device_resource` (pointer-based) wrapper, keeping only the ref-based `set_current_device_resource_ref`. Update return types of `set/reset_current_device_resource_ref` to `cuda::mr::any_resource<cuda::mr::device_accessible>`. - In `host_memory.cpp`, remove `device_memory_resource` inheritance from `pinned_pool_with_fallback_memory_resource`, remove the forward declaration workaround for `rmm::mr::pool_memory_resource` (no longer templated), and wrap non-copyable state in `shared_ptr` to satisfy the `any_resource` copyability requirement. Part of rapidsai/rmm#2011. Depends on rapidsai/rmm#2361. --------- Co-authored-by: Gil Forsyth <gforsyth@users.noreply.github.com>

## Summary - Remove dependency on `rmm::mr::device_memory_resource` base class; resources now satisfy the `cuda::mr::resource` concept directly - Replace `shared_ptr<device_memory_resource>` with value types and `cuda::mr::any_resource<cuda::mr::device_accessible>` for type-erased storage - Replace `set_current_device_resource(ptr)` / `set_per_device_resource(id, ptr)` with `set_current_device_resource_ref` / `set_per_device_resource_ref` - Remove `make_owning_wrapper` usage and `dynamic_cast` on memory resources (no common base class) - Add missing `thrust/iterator/transform_output_iterator.h` include (no longer transitively included via CCCL) Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. Depends on rapidsai/raft#2996.

## Summary - Migrate all RMM usage to the new CCCL memory resource design (de-templated resources, `device_async_resource_ref` instead of `device_memory_resource*`, value semantics) - Replace `get_workspace_resource()` / `get_large_workspace_resource()` with `_ref()` variants across 65 call sites - Rewrite `cuda_huge_page_resource` to satisfy CCCL `resource` concept directly - Remove `owning_wrapper` / `dynamic_cast` patterns in C API and benchmarks Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. Depends on rapidsai/raft#2996. ## Changes - **33 files changed** (~208 insertions, ~221 deletions) - `device_memory_resource*` params → `device_async_resource_ref` (ivf_common, ivf_pq, naive_knn) - `get_current_device_resource()` → `get_current_device_resource_ref()` - `set_current_device_resource()` → `set_current_device_resource_ref()` - De-templated `pool_memory_resource`, `failure_callback_resource_adaptor` in bench utils - Removed `&resource` pointer patterns (resources are now copyable value types) - Removed spurious `mr` arg from `select_k` calls (previously compiled due to implicit pointer→bool conversion) - C API pool resource management rewritten without `owning_wrapper` --------- Co-authored-by: gpuCI <38199262+GPUtester@users.noreply.github.com>

## Summary - Rewrite `RmmResourceAdaptor` as a thin shell inheriting `cuda::mr::shared_resource<detail::RmmResourceAdaptorImpl>`, with all mutable state in the impl class for copyable shared ownership. - Replace `device_memory_resource*` with `rmm::device_async_resource_ref` for non-owning references and `cuda::mr::any_resource` for owning storage. - Remove `rmm::mr::owning_wrapper` usage (removed in RMM 26.06). - Update `pool_memory_resource` usage (no longer a template; upstream passed as `device_async_resource_ref`). - Replace `set_current_device_resource(ptr)` with `set_current_device_resource_ref(ref)`. - Update test resources to satisfy CCCL resource concept (`allocate_sync`, `deallocate_sync`, `operator==`, `get_property`). - Update Cython bindings (`.pxd`/`.pyx`) to use `device_async_resource_ref` instead of `device_memory_resource`. - Point `get_cucascade.cmake` to local cuCascade source directory (companion cuCascade PR: NVIDIA/cuCascade#98). Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. Depends on rapidsai/cudf#22008. ## Notes - Depends on cuCascade migration: NVIDIA/cuCascade#98 - The `get_cucascade.cmake` change to use `SOURCE_DIR` is a development convenience and should be updated to point to the merged cuCascade commit before this PR is finalized. Authors: - Bradley Dice (https://github.com/bdice) - Niranda Perera (https://github.com/nirandaperera) Approvers: - Niranda Perera (https://github.com/nirandaperera) - Peter Andreas Entschev (https://github.com/pentschev) URL: #940

## Summary - Migrate all raw RMM `allocate`/`deallocate` calls to the new CCCL 3-argument API that requires explicit alignment - Replace removed `rmm.librmm.per_device_resource` Cython import with `rmm.pylibrmm.memory_resource` and use `make_any_device_resource` to obtain the resource for `device_buffer` construction Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. Depends on rapidsai/raft#2996. Depends on rapidsai/cuvs#1990. ## Changes - **`cpp/src/genetic/genetic.cu`**: Add explicit `alignof(node)` / `alignof(program)` to all `allocate` and `deallocate` calls in `parallel_evolve` and `symFit`; fix deallocation bug in `parallel_evolve` where `h_nextprogs[i].len` was incorrectly used instead of `tmp.len` to compute the buffer size being freed - **`cpp/examples/symreg/symreg_example.cpp`**: Use `params.population_size * sizeof(cg::program)` and `alignof(cg::program)` for `allocate`/`deallocate` calls, fixing incorrect byte-size computation; remove unused `<rmm/aligned.hpp>` include - **`cpp/tests/sg/genetic/evolution_test.cu`**: Add alignment arguments to allocate/deallocate in `SymReg` test - **`cpp/tests/sg/genetic/program_test.cu`**: Add alignment arguments to `SetUp`/`TearDown` allocate/deallocate calls - **`python/cuml/cuml/manifold/umap/umap.pyx`**: Replace `get_current_device_resource()` with `make_any_device_resource(get_current_device_resource().get_mr())` for `device_buffer` construction Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Simon Adorf (https://github.com/csadorf) - Divye Gala (https://github.com/divyegala) - Victor Lafargue (https://github.com/viclafargue) URL: #7951

## Summary - Replace removed `rmm::mr::device_memory_resource` base class, `owning_wrapper`, `shared_ptr`-based resource management, and deprecated per-device resource APIs with CCCL-native memory resource types - Use `cuda::mr::any_resource<cuda::mr::device_accessible>` for owning type-erased storage, `rmm::device_async_resource_ref` for non-owning references, and value-typed resources (`cuda_memory_resource`, `pinned_host_memory_resource`) - Pass the memory resource to `raft::handle_t` as the `workspace_resource` (3rd) constructor argument, matching the new raft API (`stream_view`, `stream_pool`, `std::optional<raft::mr::device_resource>`) Depends on rapidsai/rmm#2361. Depends on rapidsai/ucxx#636. Depends on rapidsai/raft#2996. Depends on rapidsai/cuvs#1990. ## Files changed **Headers:** - `algorithms.hpp`, `dendrogram.hpp`, `legacy/graph.hpp`, `legacy/functions.hpp`: `get_current_device_resource()` → `get_current_device_resource_ref()` in default argument expressions - `host_staging_buffer_manager.hpp`: Remove `owning_wrapper`, store `pool_memory_resource` by value in a `std::optional`, accept `pinned_host_memory_resource` by value in `init()` - `large_buffer_manager.hpp`: Store `pinned_host_memory_resource` by value (not `shared_ptr`), return `device_async_resource_ref` from `get()`, `std::move` the resource into storage - `mtmg/resource_manager.hpp`: Use `cuda::mr::any_resource<device_accessible>` instead of `shared_ptr<device_memory_resource>` for `per_device_rmm_resources_`, use non-deprecated `set_per_device_resource`, pass resource as `workspace_resource` to `raft::handle_t` **Tests:** - `base_fixture.hpp`: Return `any_resource<device_accessible>` from `create_memory_resource()`, use value-typed MR factory helpers (`make_cuda`, `make_managed`, `make_pool`, `make_binning`), switch to non-deprecated `set_current_device_resource` / `get_current_device_resource_ref` - `multi_node_threaded_test.cpp`: Switch to non-deprecated `set_current_device_resource(resource)` - `mg_graph500_bfs_test.cu`, `mg_graph500_sssp_test.cu`: Store `pinned_mr_` as `optional<pinned_host_memory_resource>` by value, prefer `.value()` over `operator*` for optional access **Examples:** - All 4 example files (`sg_graph_algorithms.cpp`, `mg_graph_algorithms.cpp`, `vertex_and_edge_partition.cu`, `graph_operations.cu`): Use value-typed `cuda_memory_resource`, non-deprecated `set_current_device_resource`, pass the resource to `raft::handle_t` as the `workspace_resource` (3rd positional arg, with `nullptr` for the unused `stream_pool`) Authors: - Bradley Dice (https://github.com/bdice) - Chuck Hastings (https://github.com/ChuckHastings) Approvers: - Chuck Hastings (https://github.com/ChuckHastings) - Vyas Ramasubramani (https://github.com/vyasr) URL: #5483

## Summary - Replace `device_memory_resource*` with `device_async_resource_ref` across all C++ headers, sources, benchmarks, examples, and tests. - In Cython `.pxd` declarations, change `device_memory_resource *mr` parameters to `device_async_resource_ref mr` (value type). In `.pyx` files, replace `mr.get_mr()` calls with `mr.c_ref.value()`. - Remove `cudf::set_current_device_resource` (pointer-based) wrapper, keeping only the ref-based `set_current_device_resource_ref`. Update return types of `set/reset_current_device_resource_ref` to `cuda::mr::any_resource<cuda::mr::device_accessible>`. - In `host_memory.cpp`, remove `device_memory_resource` inheritance from `pinned_pool_with_fallback_memory_resource`, remove the forward declaration workaround for `rmm::mr::pool_memory_resource` (no longer templated), and wrap non-copyable state in `shared_ptr` to satisfy the `any_resource` copyability requirement. Part of rapidsai/rmm#2011. Depends on rapidsai/rmm#2361. --------- Co-authored-by: Gil Forsyth <gforsyth@users.noreply.github.com>

bdice and others added 30 commits February 25, 2026 13:34

Refactor pool_memory_resource to shared CCCL MR design (#2258)

f7a043a

Part of #2011.

Merge remote-tracking branch 'upstream/main' into staging

fd8d55e

Merge remote-tracking branch 'upstream/main' into staging-merge-main

f103d99

Merge pull request #2288 from bdice/staging-merge-main

b196702

Merge main into staging

Merge remote-tracking branch 'upstream/main' into staging-merge-main

28ba832

Merge pull request #2311 from bdice/staging-merge-main

e4f4106

Merge main into staging

Merge remote-tracking branch 'upstream/main' into staging-merge-main

ad77533

# Conflicts: # cpp/include/rmm/mr/aligned_resource_adaptor.hpp # cpp/include/rmm/mr/tracking_resource_adaptor.hpp # cpp/tests/mr/aligned_mr_tests.cpp # cpp/tests/mr/statistics_mr_tests.cpp

Merge pull request #2320 from bdice/staging-merge-main

091a079

Merge main into staging

Merge branch 'main' into staging-merge-main

a723c62

Merge pull request #2341 from bdice/staging-merge-main

41fd57a

Merge main into staging

Merge branch 'main' into staging-merge-main

706bb50

This was referenced Apr 20, 2026

Migrate RMM usage to CCCL MR design rapidsai/rapidsmpf#940

Merged

Migrate RMM usage to CCCL MR design NVIDIA/cuopt#1035

Merged

Migrate RMM usage to CCCL MR design dmlc/xgboost#12141

Merged

CCCL Memory Resource Migration — Merge Train #2364

Closed

bdice self-assigned this Apr 21, 2026

bdice moved this to In Progress in RMM Project Board Apr 21, 2026

bdice mentioned this pull request Apr 21, 2026

[FEA] Support memory resources from CCCL 3.2 #2011

Open

51 tasks

bdice merged commit 386f76d into main Apr 21, 2026
85 checks passed

github-project-automation Bot moved this from In Progress to Done in RMM Project Board Apr 21, 2026

bdice mentioned this pull request Apr 23, 2026

Intermittent SIGSEGV at process exit when pool_memory_resource is held in the static per-device resource map #2366

Closed

coderabbitai Bot mentioned this pull request Apr 24, 2026

feat: add multiple_blocks_allocation RAII handle for fixed_size_memory_resource #2368

Open

3 tasks

coderabbitai Bot mentioned this pull request May 1, 2026

Use cuda::stream_ref for stream usage #2372

Draft

This was referenced May 16, 2026

Remove deprecated resource APIs #2387

Merged

Use explicit resources in memory resource tests #2389

Merged

Enforce base resource alignment limits #2393

Merged

This was referenced May 22, 2026

Move public memory resource definitions to source files #2416

Open

Enable -Wshadow (Google-style member-underscore renames) #2417

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge staging into main: CCCL memory resource migration#2361

Merge staging into main: CCCL memory resource migration#2361
bdice merged 38 commits into
mainfrom
staging

bdice commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bdice commented Apr 15, 2026

Description

Summary of changes

Breaking changes

Downstream library updates

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant