Skip to content

Merge staging into main: CCCL memory resource migration#2361

Merged
bdice merged 38 commits into
mainfrom
staging
Apr 21, 2026
Merged

Merge staging into main: CCCL memory resource migration#2361
bdice merged 38 commits into
mainfrom
staging

Conversation

@bdice
Copy link
Copy Markdown
Collaborator

@bdice bdice commented Apr 15, 2026

Description

Merges all breaking changes from the staging branch into main, completing the CCCL memory resource migration tracked in #2011.

Summary of changes

Adaptor refactors to cuda::shared_resource design:

Remove legacy infrastructure:

Post-cleanup:

Breaking changes

  • device_memory_resource base class has been removed. All memory resources now implement the CCCL resource concept directly.
  • owning_wrapper has been removed. Adaptors are now used with cuda::shared_resource.
  • All resource adaptors are de-templated (no more Upstream template parameter). They accept any upstream resource via device_async_resource_ref.
  • Python/Cython bindings now use device_async_resource_ref instead of device_memory_resource*.
  • cccl_adaptors.hpp has been deleted; raw CCCL resource_ref types are used directly.

See #2344 for the migration guide and #2345 for downstream consumer documentation.

Downstream library updates

All downstream RAPIDS libraries have draft PRs to adopt these changes (tracked in #2011 under "Update RAPIDS libraries").

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

bdice and others added 30 commits February 25, 2026 13:34
Convert logging_resource_adaptor from a templated, header-only class to a non-templated class using cuda::mr::shared_resource for reference-counted ownership. Implementation is now compiled into librmm.so.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Lawrence Mitchell (https://github.com/wence-)
  - David Wendt (https://github.com/davidwendt)

URL: #2246
…ared CCCL MR design (#2264)

## Summary

Converts `fixed_size_memory_resource` and `binning_memory_resource` from
header-only class templates to non-template classes backed by a
`detail::*_impl` held via `cuda::mr::shared_resource`. Follows the
pattern established by `logging_resource_adaptor` (#2246) and
`pool_memory_resource` (#2258). Part of #2011.

## Changes

**New files (6)**
- `cpp/include/rmm/mr/detail/fixed_size_memory_resource_impl.hpp` — impl
class declaration; inherits `stream_ordered_memory_resource<impl,
fixed_size_free_list>` (same CRTP pattern as pool)
- `cpp/src/mr/detail/fixed_size_memory_resource_impl.cpp` — impl member
definitions
- `cpp/src/mr/fixed_size_memory_resource.cpp` — outer class constructor
and delegating methods
- `cpp/include/rmm/mr/detail/binning_memory_resource_impl.hpp` — impl
class declaration; includes `fixed_size_memory_resource.hpp` for
`unique_ptr` member
- `cpp/src/mr/detail/binning_memory_resource_impl.cpp` — impl member
definitions
- `cpp/src/mr/binning_memory_resource.cpp` — outer class constructor and
delegating methods

**Modified files (11)**
- `cpp/include/rmm/mr/fixed_size_memory_resource.hpp` — de-templated;
`shared_resource` inheritance; `device_async_resource_ref` constructor
only; `static_assert` for concept
- `cpp/include/rmm/mr/binning_memory_resource.hpp` — de-templated; same
pattern; `Upstream*` constructors removed
- `cpp/CMakeLists.txt` — four new `.cpp` files added to library sources
- `python/rmm/rmm/librmm/memory_resource.pxd` — template parameters
removed from both declarations
- `python/rmm/rmm/pylibrmm/memory_resource/_memory_resource.pyx` —
template instantiation syntax removed from `new` expressions
- `cpp/tests/mr/mr_ref_fixed_size_tests.cpp` — replaced string-factory
suite with typed `FixedSizeMRFixture` + `CcclMrRefTest`
- `cpp/tests/mr/mr_ref_binning_tests.cpp` — replaced string-factory
suites with typed `BinningMRFixture` + all three `CcclMrRef*` suites
- `cpp/tests/mr/mr_ref_test.hpp` — `make_fixed_size`/`make_binning`
factory helpers rewritten without `owning_wrapper`; type aliases
de-templated
- `cpp/tests/mr/binning_mr_tests.cpp` — removed explicit template
instantiation and `ThrowOnNullUpstream` (null pointer constructor no
longer exists); updated `ExplicitBinMR` to use
`device_async_resource_ref`
- `cpp/tests/mr/cccl_adaptor_tests.cpp` — added
`fixed_size_memory_resource` and `binning_memory_resource` to the
shared-ownership typed test suite
- `cpp/tests/mr/thrust_allocator_tests.cu` — removed `"Binning"` from
the string-dispatch parameterization (coverage moved to
`BINNING_MR_REF_*` typed suites; the old dispatch path caused a dangling
ref crash — the exact bug this PR fixes)

## Breaking changes

- `fixed_size_memory_resource<Upstream>` → `fixed_size_memory_resource`
(template parameter removed)
- `binning_memory_resource<Upstream>` → `binning_memory_resource`
(template parameter removed)
- `Upstream*` constructor overloads removed; use
`device_async_resource_ref`
- Both classes become copyable with shared ownership semantics

## Testing

`build-rmm-cpp -j0 && test-rmm-cpp`: 89/89 tests pass.
…d CCCL MR design (#2265)

## Summary

Converts `tracking_resource_adaptor`, `statistics_resource_adaptor`, and
`aligned_resource_adaptor` from header-only class templates to
non-template classes backed by a `detail::*_impl` held via
`cuda::mr::shared_resource`, following the pattern established by
`logging_resource_adaptor` (#2246).

Part of #2011.

## Changes

### New files
- `cpp/include/rmm/mr/detail/tracking_resource_adaptor_impl.hpp`
- `cpp/include/rmm/mr/detail/statistics_resource_adaptor_impl.hpp`
- `cpp/include/rmm/mr/detail/aligned_resource_adaptor_impl.hpp`
- `cpp/src/mr/detail/tracking_resource_adaptor_impl.cpp`
- `cpp/src/mr/detail/statistics_resource_adaptor_impl.cpp`
- `cpp/src/mr/detail/aligned_resource_adaptor_impl.cpp`
- `cpp/src/mr/tracking_resource_adaptor.cpp`
- `cpp/src/mr/statistics_resource_adaptor.cpp`
- `cpp/src/mr/aligned_resource_adaptor.cpp`
- `cpp/tests/mr/mr_ref_tracking_tests.cpp` — `CcclMrRefTest` /
`CcclMrRefAllocationTest` / `CcclMrRefTestMT` instantiations
- `cpp/tests/mr/mr_ref_statistics_tests.cpp` — same
- `cpp/tests/mr/mr_ref_aligned_tests.cpp` — `CcclMrRefTest` /
`CcclMrRefAllocationTest`

### Modified files
- Public headers de-templated, private `shared_resource` inheritance,
`get_property` friend, `static_assert` concept check
- `cpp/CMakeLists.txt` — new `.cpp` sources added
- `cpp/tests/CMakeLists.txt` — `TRACKING_MR_REF_TEST`,
`STATISTICS_MR_REF_TEST`, `ALIGNED_MR_REF_TEST` targets
- `cpp/tests/mr/adaptor_tests.cpp` — removed template/pointer-based
aligned tests, updated `owning_wrapper` to use
`limiting_resource_adaptor`
- `cpp/tests/mr/tracking_mr_tests.cpp`, `statistics_mr_tests.cpp`,
`aligned_mr_tests.cpp` — template aliases removed, null/pointer
constructions replaced, stacked-adaptor tests use
`device_async_resource_ref{mr}` to avoid copy-construction
- `cpp/tests/mr/cccl_adaptor_tests.cpp` — all three new adaptors added
to the typed shared-ownership suite
- `python/rmm/rmm/librmm/memory_resource.pxd` — template parameters
removed from `statistics_resource_adaptor` and
`tracking_resource_adaptor`
- `python/rmm/rmm/pylibrmm/memory_resource/_memory_resource.pyx` —
template instantiation syntax removed

## Checklist

- [x] Create `detail/*_impl.hpp` (class declaration)
- [x] Create `src/mr/detail/*_impl.cpp` (member definitions)
- [x] Create `src/mr/*_adaptor.cpp` (outer class definitions)
- [x] Modify public headers (de-template, private inheritance,
`get_property`, `static_assert`)
- [x] Update `CMakeLists.txt`
- [x] Update Cython `.pxd` and `.pyx`
- [x] Update tests (remove template instantiation, add non-template
fixtures)
- [x] `pre-commit run --all-files`
- [x] `build-rmm-cpp -j0 && test-rmm-cpp`
## Description
The set/reset functions returned `device_async_resource_ref`
(non-owning) to the previous resource, but the underlying `any_resource`
in the map was immediately overwritten, leaving the returned ref
dangling. This was UB that happened to be masked by small buffer
optimization for small resource types like `cuda_memory_resource`.

This PR returns `cuda::mr::any_resource<cuda::mr::device_accessible>`
(owning) instead, using `std::exchange` to atomically swap old and new
values.

## Checklist
- [x] I am familiar with the [Contributing
Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md).
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.
## Summary

- De-template `arena_memory_resource` by removing the `Upstream`
template parameter
- Split implementation into `detail::arena_memory_resource_impl` held
via `cuda::mr::shared_resource` for reference-counted, copyable
ownership
- Retain the `device_memory_resource` legacy compatibility layer
(`do_allocate`, `do_deallocate`, `do_is_equal`)
- Update benchmarks, C++ tests, and Python (Cython) bindings to use the
non-template `arena_memory_resource`

This follows the same pattern established in #2246, #2258, #2264, and
#2265 for the other memory resources.

### New files

| File | Contents |
|------|----------|
| `cpp/include/rmm/mr/detail/arena_memory_resource_impl.hpp` |
`detail::arena_memory_resource_impl` class declaration |
| `cpp/src/mr/detail/arena_memory_resource_impl.cpp` | Impl member
function definitions |
| `cpp/src/mr/arena_memory_resource.cpp` | Outer class constructor +
delegating method definitions |

### Modified files

| File | Change |
|------|--------|
| `cpp/include/rmm/mr/arena_memory_resource.hpp` | De-template,
`shared_resource` wrapping |
| `cpp/CMakeLists.txt` | Add new `.cpp` source files |
| `cpp/tests/mr/arena_mr_tests.cpp` |
`arena_memory_resource<device_memory_resource>` →
`arena_memory_resource` |
| `cpp/tests/mr/mr_ref_arena_tests.cpp` | Add `ArenaMRFixture` +
`CcclMrRefTest`/`CcclMrRefAllocationTest`/`CcclMrRefTestMT`
instantiations |
| `cpp/tests/mr/mr_ref_test.hpp` | Update `make_arena()` and `arena_mr`
type alias |
| `cpp/tests/mr/cccl_adaptor_tests.cpp` | Add arena `static_assert` +
`ArenaMRAdaptorTest` |
|
`cpp/benchmarks/multi_stream_allocations/multi_stream_allocations_bench.cu`
| Update `make_arena()` |
| `cpp/benchmarks/random_allocations/random_allocations.cpp` | Update
`make_arena()` |
| `cpp/benchmarks/replay/replay.cpp` | Update `make_arena()` |
| `python/rmm/rmm/librmm/memory_resource.pxd` | Remove `[Upstream]`
template from `arena_memory_resource` |
| `python/rmm/rmm/pylibrmm/memory_resource/_memory_resource.pyx` |
Remove template instantiation syntax for arena |
## Summary
- Split `callback_memory_resource` implementation into
`detail/callback_memory_resource_impl.hpp` +
`src/mr/detail/callback_memory_resource_impl.cpp`, using
`cuda::mr::shared_resource` for shared ownership
- Accept `device_async_resource_ref` upstream; class is now non-template
- Add `mr_ref_callback_tests.cpp` and integrate into
`cccl_adaptor_tests.cpp` typed test suite
- Update Cython `.pxd`/`.pyx` bindings to match non-template C++
signature
## Summary
- Split `prefetch_resource_adaptor` implementation into
`detail/prefetch_resource_adaptor_impl.hpp` +
`src/mr/detail/prefetch_resource_adaptor_impl.cpp`, using
`cuda::mr::shared_resource` for shared ownership
- Accept `device_async_resource_ref` upstream; class is now non-template
- Add `mr_ref_prefetch_tests.cpp` and integrate into
`cccl_adaptor_tests.cpp` typed test suite
- Update `adaptor_tests.cpp` type aliases, `NullUpstream`, and
`Equality` tests for non-template type
- Update Cython `.pxd`/`.pyx` bindings to match non-template C++
signature
## Summary
- Split `thread_safe_resource_adaptor` implementation into
`detail/thread_safe_resource_adaptor_impl.hpp` +
`src/mr/detail/thread_safe_resource_adaptor_impl.cpp`, using
`cuda::mr::shared_resource` for shared ownership
- Accept `device_async_resource_ref` upstream; class is now non-template
- Add `mr_ref_thread_safe_tests.cpp` and integrate into
`cccl_adaptor_tests.cpp` typed test suite
- Update `adaptor_tests.cpp` type aliases, `NullUpstream`, and
`Equality` tests for non-template type
## Description
This merges the following changes into the `staging` branch:
- Update Cython lower bound pin to 3.2.2 (#2266)
- Remove pytest upper bound pin (#2268)
- Reduce default pool sizes in Python tests to speed up suite (#2273)

## Checklist
- [x] I am familiar with the [Contributing
Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md).
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.

---------

Co-authored-by: Vyas Ramasubramani <vyasr@nvidia.com>
## Summary
- Split `limiting_resource_adaptor` implementation into
`detail/limiting_resource_adaptor_impl.hpp` +
`src/mr/detail/limiting_resource_adaptor_impl.cpp`, using
`cuda::mr::shared_resource` for shared ownership
- Accept `device_async_resource_ref` upstream; class is now non-template
- Add `mr_ref_limiting_tests.cpp` and integrate into
`cccl_adaptor_tests.cpp` typed test suite
- Update `adaptor_tests.cpp` type aliases, `NullUpstream`, `Equality`,
and `owning_wrapper` Equality tests for non-template type with
`shared_resource` semantics
- Update Cython `.pxd`/`.pyx` bindings to match non-template C++
signature
…2278)

## Summary
- Split `failure_callback_resource_adaptor` implementation into
`detail/failure_callback_resource_adaptor_impl.hpp` (header-only, since
impl is templated on `ExceptionType`)
- Accept `device_async_resource_ref` upstream; template parameter
changes from `<Upstream, ExceptionType>` to `<ExceptionType>` only
- Add `mr_ref_failure_callback_tests.cpp` and integrate into
`cccl_adaptor_tests.cpp` typed test suite
- Update `adaptor_tests.cpp` type aliases, `NullUpstream`, and
`Equality` tests
- Update Cython `.pxd`/`.pyx` bindings to use `[ExceptionType]` template
parameter with `out_of_memory` forward decl
## Description

Closes #2285

Removes `owning_wrapper` and `make_owning_wrapper`, which are no longer
necessary after the `cuda::shared_resource` adaptor conversion in #2011.
All adaptors now manage upstream lifetime directly via `any_resource`,
so the problem `owning_wrapper` solved no longer exists.

- Delete `cpp/include/rmm/mr/owning_wrapper.hpp`
- Remove stale `#include`s from 4 files (3 benchmarks +
`mr_ref_test.hpp`)
- Remove `owning_wrapper` from the typed test suite in
`adaptor_tests.cpp`

## Checklist
- [x] I am familiar with the [Contributing
Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md).
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.

---------

Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>
## Description

Closes #2287

Migrates all 8 base (non-adaptor) memory resources to natively satisfy
the CCCL `cuda::mr::resource` concept, so that concrete types (e.g.
`cuda_memory_resource&`) satisfy the concept without virtual dispatch
through `device_memory_resource`.

**Stateless resources** get
`allocate`/`deallocate`/`allocate_sync`/`deallocate_sync` accepting
`cuda::stream_ref` directly on the class:
- `cuda_memory_resource` — `device_accessible`
- `managed_memory_resource` — `device_accessible` + `host_accessible`
- `pinned_host_memory_resource` — `device_accessible` +
`host_accessible`
- `cuda_async_view_memory_resource` — `device_accessible`
- `system_memory_resource` — `device_accessible` + `host_accessible`

**Stateful, non-copyable resources** use
`cuda::mr::shared_resource<Impl>` with `_impl` classes extracted to
`detail/` headers and `.cpp` source files, matching the adaptor
convention from prior PRs (e.g. `limiting_resource_adaptor`):
- `cuda_async_memory_resource` — `device_accessible`
- `cuda_async_managed_memory_resource` — `device_accessible` +
`host_accessible`
- `sam_headroom_memory_resource` — `device_accessible` +
`host_accessible`

`device_memory_resource` inheritance is kept for backward compatibility.
`do_allocate`/`do_deallocate` delegate to the new CCCL methods. Default
alignment is `rmm::CUDA_ALLOCATION_ALIGNMENT` (256 bytes).

## Checklist
- [x] I am familiar with the [Contributing
Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md).
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.

---------

Co-authored-by: Lawrence Mitchell <wence@gmx.li>
…async_resource_ref (#2300)

## Summary

Replaces `shared_ptr[device_memory_resource]` with per-subclass
`unique_ptr[ConcreteType]` (owning) and
`optional[device_async_resource_ref]` (non-owning reference) across all
Python/Cython bindings. This is a part of #2011.

There are **significant** opportunities to make this Cython code better
over time but I have to get something that removes
`device_memory_resource` from the Python/Cython side before I can finish
migration on the C++ side (#2296). I welcome critique of this design,
and ideas for how it can be improved, particularly from @vyasr @wence-.
I would like to address any suggested improvements in follow-up PRs,
because this changeset is necessary to unblock #2301.

The changes in `cdef class DeviceMemoryResource` are perhaps the most
significant changes here from a design perspective.

The solution I'm going with for now is to keep the
`DeviceMemoryResource` class around, as a base class for the Cython MRs,
and let it handle allocate/deallocate. It owns a
`optional[device_async_resource_ref]` which is used for
allocation/deallocation. It's `optional` so that the class can be
default-constructed (Cython requires nullary constructors), but it
should never be `nullopt` except during initialization.

Then, each MR class owns a `c_obj` like
`unique_ptr[cuda_memory_resource]`. This is `unique_ptr` so it can be
default-constructed for Cython's requirements. I chose `unique_ptr` over
`optional` here to emphasize that this member is the thing that actually
owns the resource. As with the `c_ref`, this should never be `nullptr`
except during initialization. When an MR class is created, it
initializes its `c_obj` and then constructs a `c_ref` (a member
inherited from the `DeviceMemoryResource` base class).

"Special" methods for an MR like getting the statistics counts go
through `deref(self.c_obj)`, and "common" methods like
allocate/deallocate go through `self.c_ref.value()`.

### Changes

- **`.pxd` declarations**: Remove `device_memory_resource` class.
Declare `device_async_resource_ref` and a
`make_device_async_resource_ref()` inline C++ template that returns
`optional` to work around Cython generating default-constructed
temporaries for non-default-constructible types. All adaptor
constructors take `device_async_resource_ref` instead of
`device_memory_resource*`.
- **`.pxd` class definitions**: `DeviceMemoryResource` base holds
`optional[device_async_resource_ref] c_ref`; each concrete subclass
holds `unique_ptr[ConcreteType] c_obj`.
- **`.pyx` implementations**: All `__cinit__` methods construct via
`unique_ptr` then set `c_ref` via `make_device_async_resource_ref`.
Typed accessors (`pool_size`, `flush`, etc.) use `deref(self.c_obj)`.
Per-device functions use `set_per_device_resource_ref`.
- **`device_buffer.pyx`**: Passes `self.mr.c_ref.value()` instead of
`self.mr.get_mr()`.

Closes #2294
# Conflicts:
#	cpp/include/rmm/mr/aligned_resource_adaptor.hpp
#	cpp/include/rmm/mr/tracking_resource_adaptor.hpp
#	cpp/tests/mr/aligned_mr_tests.cpp
#	cpp/tests/mr/statistics_mr_tests.cpp
…tors (#2301)

## Summary

- Remove `device_memory_resource` inheritance from all memory resources
(stateless, stateful, and adaptors)
- Remove `do_allocate` / `do_deallocate` / `do_is_equal` virtual
overrides from all resources
- Rewrite benchmark factory functions from
`shared_ptr<device_memory_resource>` to `any_device_resource`
- Convert `simulated_memory_resource` from DMR inheritance to CCCL
concepts
- Change copy/move from `= delete` to `= default` on
`cuda_async_memory_resource`, `cuda_async_managed_memory_resource`,
`sam_headroom_memory_resource`, and `simulated_memory_resource`
(required for CCCL `resource_ref` copyability via `shared_resource`
base)
- Remove NullUpstream tests and DEVICE_MEMORY_RESOURCE_VIEW_TEST (no
longer needed without DMR)

Closes #2295
Part of #2011
## Summary

- Delete `device_memory_resource.hpp` and
`device_memory_resource_view.hpp`
- Remove pointer-based `per_device_resource` APIs and bridge helpers
- Simplify `cccl_adaptors.hpp` (remove DMR bridge code, retain wrapper
for deletion in a follow-up)
- Rewrite test mock resources (`mock_resource.hpp`,
`device_check_resource_adaptor.hpp`) to use CCCL concepts directly
- Update `callback_memory_resource`, aligned, arena, and
failure_callback tests

Closes #2296
Part of #2011
…, and device_check_resource_adaptor (#2340)

## Description

Replace `device_async_resource_ref` members with
`cuda::mr::any_resource<device_accessible>` in `polymorphic_allocator`,
`thrust_allocator`, and `device_check_resource_adaptor`. This eliminates
the CCCL [#8037](NVIDIA/cccl#8037) recursive
constraint cycle for these classes.

Constructor signatures are unchanged; they still accept
`device_async_resource_ref`, which implicitly converts to
`any_resource`.

## Checklist
- [x] I am familiar with the [Contributing
Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md).
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.
…emoval (#2342)

Leaf MRs no longer enforce alignment limits after the bridge
infrastructure was removed in #2324. Disable the four tests that
expect bad_alloc for alignment > CUDA_ALLOCATION_ALIGNMENT until
alignment enforcement is restored.
## Description

Delete `cccl_adaptors.hpp` and replace RMM's wrapper types
(`cccl_resource_ref`, `cccl_async_resource_ref`) with direct aliases to
CCCL's `resource_ref` and `synchronous_resource_ref`. This eliminates
the 480-line adaptor layer that was originally needed to work around the
CCCL [#8037](NVIDIA/cccl#8037) recursive
constraint satisfaction issue, which has since been fixed upstream in
CCCL [#8121](NVIDIA/cccl#8121).

Additional changes:
- `per_device_resource`: `static_cast<any_device_resource>(ref)`
replaced with `any_device_resource{ref}` (wrapper had `operator
any_resource`)
- Add missing `cuda_stream_view.hpp` include to three impl headers that
previously got it transitively through `cccl_adaptors.hpp`

## Checklist
- [x] I am familiar with the [Contributing
Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md).
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.
@bdice bdice self-assigned this Apr 21, 2026
@bdice bdice moved this to In Progress in RMM Project Board Apr 21, 2026
@bdice bdice merged commit 386f76d into main Apr 21, 2026
85 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in RMM Project Board Apr 21, 2026
rapids-bot Bot pushed a commit to rapidsai/ucxx that referenced this pull request Apr 21, 2026
Migrate to RMM's CCCL-based memory resources.

Part of rapidsai/rmm#2011.
Depends on rapidsai/rmm#2361.

## Notes

The final commit in this PR (`TEMP: Use CI artifacts from RMM PR #2361`) will be reverted before merging. It exists solely to pull CI artifacts from the RMM PR for testing.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #636
rapids-bot Bot pushed a commit to rapidsai/raft that referenced this pull request Apr 21, 2026
## Summary
- Remove `device_memory_resource` base class usage, de-template all resource and adaptor types, replace pointer-based per-device resource APIs with ref-based equivalents
- Part of rapidsai/rmm#2011. Migration guide: rapidsai/rmm#2344.
- Supersedes #2917 and #2920

Depends on rapidsai/rmm#2361.
Depends on rapidsai/ucxx#636.

## Changes

### Core resource infrastructure
- **`device_memory_resource.hpp`**: Remove `any_resource_bridge` (which inherited from `rmm::mr::device_memory_resource`), remove all `shared_ptr<device_memory_resource>` constructor overloads, consolidate to `any_resource`-only path
- **`device_resources.hpp`**: Remove deprecated constructor taking `shared_ptr<device_memory_resource>`, update `get_workspace_resource()` return type (de-templated `limiting_resource_adaptor`)
- **`device_resources_snmg.hpp`**: Remove stale include, de-template `pool_memory_resource`
- **`handle.hpp`**: Remove deprecated constructors taking `shared_ptr<device_memory_resource>`
- **`device_resources_manager.hpp`**: Retype `workspace_mrs` vector from `shared_ptr<device_memory_resource>` to `raft::mr::device_resource`, update `set_workspace_memory_resource()` signature accordingly, de-template `pool_mr_` to `optional<pool_memory_resource>`, remove `dynamic_cast` for upstream type detection, replace `get/set_current_device_resource()` with `_ref` variants

### Memory tracking
- **`memory_tracking_resources.hpp`**: Remove `device_tracking_bridge` (inherited from `device_memory_resource`), use `set_current_device_resource_ref()` directly

### Call sites using `get_workspace_resource()` → `get_workspace_resource_ref()`
- `select_k-inl.cuh`, `select_radix.cuh`, `select_warpsort.cuh`, `sparse/select_k-inl.cuh`, `bitmap_to_csr.cuh`, `bitset_to_csr.cuh`

### Benchmarks
- **`benchmark.hpp`**: De-template `pool_memory_resource`, use `any_resource` for RAII restore
- **`gather.cu`**, **`subsample.cu`**: Same pattern

### Tests
- **`handle.cpp`**: Dereference `limiting_resource_adaptor*` for `device_buffer` constructor
- **`device_resources_manager.cpp`**: Remove workspace-related test code for removed APIs
- **`mdarray.cu`**: Remove `test_device_resource_bridge_unwrap` (bridge no longer exists)
- **`multi_variable_gaussian.cu`**: `get_current_device_resource()` → `get_current_device_resource_ref()`

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Divye Gala (https://github.com/divyegala)

URL: #2996
gforsyth added a commit to rapidsai/cudf that referenced this pull request Apr 21, 2026
## Summary

- Replace `device_memory_resource*` with `device_async_resource_ref`
across all C++ headers, sources, benchmarks, examples, and tests.
- In Cython `.pxd` declarations, change `device_memory_resource *mr`
parameters to `device_async_resource_ref mr` (value type). In `.pyx`
files, replace `mr.get_mr()` calls with `mr.c_ref.value()`.
- Remove `cudf::set_current_device_resource` (pointer-based) wrapper,
keeping only the ref-based `set_current_device_resource_ref`. Update
return types of `set/reset_current_device_resource_ref` to
`cuda::mr::any_resource<cuda::mr::device_accessible>`.
- In `host_memory.cpp`, remove `device_memory_resource` inheritance from
`pinned_pool_with_fallback_memory_resource`, remove the forward
declaration workaround for `rmm::mr::pool_memory_resource` (no longer
templated), and wrap non-copyable state in `shared_ptr` to satisfy the
`any_resource` copyability requirement.

Part of rapidsai/rmm#2011.
Depends on rapidsai/rmm#2361.

---------

Co-authored-by: Gil Forsyth <gforsyth@users.noreply.github.com>
rgsl888prabhu pushed a commit to NVIDIA/cuopt that referenced this pull request Apr 21, 2026
## Summary

- Remove dependency on `rmm::mr::device_memory_resource` base class;
resources now satisfy the `cuda::mr::resource` concept directly
- Replace `shared_ptr<device_memory_resource>` with value types and
`cuda::mr::any_resource<cuda::mr::device_accessible>` for type-erased
storage
- Replace `set_current_device_resource(ptr)` /
`set_per_device_resource(id, ptr)` with
`set_current_device_resource_ref` / `set_per_device_resource_ref`
- Remove `make_owning_wrapper` usage and `dynamic_cast` on memory
resources (no common base class)
- Add missing `thrust/iterator/transform_output_iterator.h` include (no
longer transitively included via CCCL)

Depends on rapidsai/rmm#2361.
Depends on rapidsai/ucxx#636.
Depends on rapidsai/raft#2996.
gforsyth pushed a commit to rapidsai/cuvs that referenced this pull request Apr 21, 2026
## Summary
- Migrate all RMM usage to the new CCCL memory resource design
(de-templated resources, `device_async_resource_ref` instead of
`device_memory_resource*`, value semantics)
- Replace `get_workspace_resource()` / `get_large_workspace_resource()`
with `_ref()` variants across 65 call sites
- Rewrite `cuda_huge_page_resource` to satisfy CCCL `resource` concept
directly
- Remove `owning_wrapper` / `dynamic_cast` patterns in C API and
benchmarks

Depends on rapidsai/rmm#2361.
Depends on rapidsai/ucxx#636.
Depends on rapidsai/raft#2996.

## Changes
- **33 files changed** (~208 insertions, ~221 deletions)
- `device_memory_resource*` params → `device_async_resource_ref`
(ivf_common, ivf_pq, naive_knn)
- `get_current_device_resource()` → `get_current_device_resource_ref()`
- `set_current_device_resource()` → `set_current_device_resource_ref()`
- De-templated `pool_memory_resource`,
`failure_callback_resource_adaptor` in bench utils
- Removed `&resource` pointer patterns (resources are now copyable value
types)
- Removed spurious `mr` arg from `select_k` calls (previously compiled
due to implicit pointer→bool conversion)
- C API pool resource management rewritten without `owning_wrapper`

---------

Co-authored-by: gpuCI <38199262+GPUtester@users.noreply.github.com>
rapids-bot Bot pushed a commit to rapidsai/rapidsmpf that referenced this pull request Apr 21, 2026
## Summary

- Rewrite `RmmResourceAdaptor` as a thin shell inheriting `cuda::mr::shared_resource<detail::RmmResourceAdaptorImpl>`, with all mutable state in the impl class for copyable shared ownership.
- Replace `device_memory_resource*` with `rmm::device_async_resource_ref` for non-owning references and `cuda::mr::any_resource` for owning storage.
- Remove `rmm::mr::owning_wrapper` usage (removed in RMM 26.06).
- Update `pool_memory_resource` usage (no longer a template; upstream passed as `device_async_resource_ref`).
- Replace `set_current_device_resource(ptr)` with `set_current_device_resource_ref(ref)`.
- Update test resources to satisfy CCCL resource concept (`allocate_sync`, `deallocate_sync`, `operator==`, `get_property`).
- Update Cython bindings (`.pxd`/`.pyx`) to use `device_async_resource_ref` instead of `device_memory_resource`.
- Point `get_cucascade.cmake` to local cuCascade source directory (companion cuCascade PR: NVIDIA/cuCascade#98).

Depends on rapidsai/rmm#2361.
Depends on rapidsai/ucxx#636.
Depends on rapidsai/cudf#22008.

## Notes

- Depends on cuCascade migration: NVIDIA/cuCascade#98
- The `get_cucascade.cmake` change to use `SOURCE_DIR` is a development convenience and should be updated to point to the merged cuCascade commit before this PR is finalized.

Authors:
  - Bradley Dice (https://github.com/bdice)
  - Niranda Perera (https://github.com/nirandaperera)

Approvers:
  - Niranda Perera (https://github.com/nirandaperera)
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #940
rapids-bot Bot pushed a commit to rapidsai/cuml that referenced this pull request Apr 21, 2026
## Summary
- Migrate all raw RMM `allocate`/`deallocate` calls to the new CCCL 3-argument API that requires explicit alignment
- Replace removed `rmm.librmm.per_device_resource` Cython import with `rmm.pylibrmm.memory_resource` and use `make_any_device_resource` to obtain the resource for `device_buffer` construction

Depends on rapidsai/rmm#2361.
Depends on rapidsai/ucxx#636.
Depends on rapidsai/raft#2996.
Depends on rapidsai/cuvs#1990.

## Changes
- **`cpp/src/genetic/genetic.cu`**: Add explicit `alignof(node)` / `alignof(program)` to all `allocate` and `deallocate` calls in `parallel_evolve` and `symFit`; fix deallocation bug in `parallel_evolve` where `h_nextprogs[i].len` was incorrectly used instead of `tmp.len` to compute the buffer size being freed
- **`cpp/examples/symreg/symreg_example.cpp`**: Use `params.population_size * sizeof(cg::program)` and `alignof(cg::program)` for `allocate`/`deallocate` calls, fixing incorrect byte-size computation; remove unused `<rmm/aligned.hpp>` include
- **`cpp/tests/sg/genetic/evolution_test.cu`**: Add alignment arguments to allocate/deallocate in `SymReg` test
- **`cpp/tests/sg/genetic/program_test.cu`**: Add alignment arguments to `SetUp`/`TearDown` allocate/deallocate calls
- **`python/cuml/cuml/manifold/umap/umap.pyx`**: Replace `get_current_device_resource()` with `make_any_device_resource(get_current_device_resource().get_mr())` for `device_buffer` construction

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Simon Adorf (https://github.com/csadorf)
  - Divye Gala (https://github.com/divyegala)
  - Victor Lafargue (https://github.com/viclafargue)

URL: #7951
rapids-bot Bot pushed a commit to rapidsai/cugraph that referenced this pull request Apr 23, 2026
## Summary

- Replace removed `rmm::mr::device_memory_resource` base class, `owning_wrapper`, `shared_ptr`-based resource management, and deprecated per-device resource APIs with CCCL-native memory resource types
- Use `cuda::mr::any_resource<cuda::mr::device_accessible>` for owning type-erased storage, `rmm::device_async_resource_ref` for non-owning references, and value-typed resources (`cuda_memory_resource`, `pinned_host_memory_resource`)
- Pass the memory resource to `raft::handle_t` as the `workspace_resource` (3rd) constructor argument, matching the new raft API (`stream_view`, `stream_pool`, `std::optional<raft::mr::device_resource>`)

Depends on rapidsai/rmm#2361.
Depends on rapidsai/ucxx#636.
Depends on rapidsai/raft#2996.
Depends on rapidsai/cuvs#1990.

## Files changed

**Headers:**
- `algorithms.hpp`, `dendrogram.hpp`, `legacy/graph.hpp`, `legacy/functions.hpp`: `get_current_device_resource()` → `get_current_device_resource_ref()` in default argument expressions
- `host_staging_buffer_manager.hpp`: Remove `owning_wrapper`, store `pool_memory_resource` by value in a `std::optional`, accept `pinned_host_memory_resource` by value in `init()`
- `large_buffer_manager.hpp`: Store `pinned_host_memory_resource` by value (not `shared_ptr`), return `device_async_resource_ref` from `get()`, `std::move` the resource into storage
- `mtmg/resource_manager.hpp`: Use `cuda::mr::any_resource<device_accessible>` instead of `shared_ptr<device_memory_resource>` for `per_device_rmm_resources_`, use non-deprecated `set_per_device_resource`, pass resource as `workspace_resource` to `raft::handle_t`

**Tests:**
- `base_fixture.hpp`: Return `any_resource<device_accessible>` from `create_memory_resource()`, use value-typed MR factory helpers (`make_cuda`, `make_managed`, `make_pool`, `make_binning`), switch to non-deprecated `set_current_device_resource` / `get_current_device_resource_ref`
- `multi_node_threaded_test.cpp`: Switch to non-deprecated `set_current_device_resource(resource)`
- `mg_graph500_bfs_test.cu`, `mg_graph500_sssp_test.cu`: Store `pinned_mr_` as `optional<pinned_host_memory_resource>` by value, prefer `.value()` over `operator*` for optional access

**Examples:**
- All 4 example files (`sg_graph_algorithms.cpp`, `mg_graph_algorithms.cpp`, `vertex_and_edge_partition.cu`, `graph_operations.cu`): Use value-typed `cuda_memory_resource`, non-deprecated `set_current_device_resource`, pass the resource to `raft::handle_t` as the `workspace_resource` (3rd positional arg, with `nullptr` for the unused `stream_pool`)

Authors:
  - Bradley Dice (https://github.com/bdice)
  - Chuck Hastings (https://github.com/ChuckHastings)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #5483
shrshi pushed a commit to shrshi/cudf that referenced this pull request May 12, 2026
## Summary

- Replace `device_memory_resource*` with `device_async_resource_ref`
across all C++ headers, sources, benchmarks, examples, and tests.
- In Cython `.pxd` declarations, change `device_memory_resource *mr`
parameters to `device_async_resource_ref mr` (value type). In `.pyx`
files, replace `mr.get_mr()` calls with `mr.c_ref.value()`.
- Remove `cudf::set_current_device_resource` (pointer-based) wrapper,
keeping only the ref-based `set_current_device_resource_ref`. Update
return types of `set/reset_current_device_resource_ref` to
`cuda::mr::any_resource<cuda::mr::device_accessible>`.
- In `host_memory.cpp`, remove `device_memory_resource` inheritance from
`pinned_pool_with_fallback_memory_resource`, remove the forward
declaration workaround for `rmm::mr::pool_memory_resource` (no longer
templated), and wrap non-copyable state in `shared_ptr` to satisfy the
`any_resource` copyability requirement.

Part of rapidsai/rmm#2011.
Depends on rapidsai/rmm#2361.

---------

Co-authored-by: Gil Forsyth <gforsyth@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change improvement Improvement / enhancement to an existing function

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant