Migrate RMM usage to CCCL MR design#5483
Conversation
Replace removed rmm::mr::device_memory_resource base class, owning_wrapper, shared_ptr-based resource management, and deprecated per-device resource APIs with CCCL-native memory resource types: value-typed resources, cuda::mr::any_resource for owning type-erased storage, and rmm::device_async_resource_ref for non-owning references.
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
70478b1 to
bfe9d12
Compare
bfe9d12 to
45d182c
Compare
e42e090 to
09ab98b
Compare
09ab98b to
dd809dd
Compare
bdice
left a comment
There was a problem hiding this comment.
Self-review, leaving comments for an agent to address.
|
|
||
| std::unique_ptr<raft::handle_t> handle = | ||
| std::make_unique<raft::handle_t>(rmm::cuda_stream_per_thread, resource); | ||
| std::make_unique<raft::handle_t>(rmm::cuda_stream_per_thread); |
There was a problem hiding this comment.
TODO: I'm not sure if removing the resource argument here is right or wrong. Same applies in the other examples.
| rmm::align_down(std::min(free, total / 6), rmm::CUDA_ALLOCATION_ALIGNMENT); | ||
|
|
||
| auto per_device_it = per_device_rmm_resources_.insert( | ||
| auto upstream = rmm::mr::cuda_memory_resource(); |
There was a problem hiding this comment.
Inline this in the pool_memory_resource constructor, we don't need the upstream variable.
| std::pair{global_rank, | ||
| rmm::mr::make_owning_wrapper<rmm::mr::pool_memory_resource>( | ||
| std::make_shared<rmm::mr::cuda_memory_resource>(), min_alloc)}); | ||
| cuda::mr::any_resource<cuda::mr::device_accessible>( |
There was a problem hiding this comment.
Do we really need this explicit cast to any_resource? Remove if possible.
| #endif | ||
|
|
||
| rmm::mr::set_per_device_resource_ref(local_device_id, per_device_it.first->second.get()); | ||
| rmm::mr::set_per_device_resource_ref(local_device_id, |
There was a problem hiding this comment.
Deprecated. Use rmm::mr::set_per_device_resource instead.
| handles.push_back(std::make_unique<raft::handle_t>( | ||
| rmm::cuda_stream_per_thread, std::make_shared<rmm::cuda_stream_pool>(n_streams))); |
There was a problem hiding this comment.
Again, why aren't we passing the resource anymore?
| large_memory_buffer_resource_t(std::shared_ptr<rmm::mr::pinned_host_memory_resource> mr) : mr_(mr) | ||
| { | ||
| } | ||
| large_memory_buffer_resource_t(rmm::mr::pinned_host_memory_resource mr) : mr_(mr) {} |
There was a problem hiding this comment.
We should be able to std::move mr into mr_.
| rmm::mr::pinned_host_memory_resource mr) | ||
| { | ||
| return detail::large_memory_buffer_resource_t(std::move(mr)); | ||
| return detail::large_memory_buffer_resource_t(mr); |
There was a problem hiding this comment.
Similar here, we should be able to std::move.
| cugraph::large_buffer_manager::init( | ||
| *handle_, | ||
| cugraph::large_buffer_manager::create_memory_buffer_resource(pinned_mr_), | ||
| cugraph::large_buffer_manager::create_memory_buffer_resource(*pinned_mr_), |
There was a problem hiding this comment.
I prefer to call .value() instead of dereferencing optionals. I think that should be fixed everywhere in the diff of this PR.
- resource_manager.hpp: inline pool upstream, drop explicit any_resource
cast, use non-deprecated set_per_device_resource, restore per_device_it
from insert, and pass workspace_resource to raft::handle_t.
- large_buffer_manager.hpp: std::move pinned_host_memory_resource in
constructor and create_memory_buffer_resource.
- mg_graph500_{bfs,sssp}_test.cu: prefer std::optional::value() over
operator*.
- examples: pass workspace_resource to raft::handle_t as the third
positional argument.
…into rmm-cccl-migration
There was a problem hiding this comment.
Self-review: I'm now happy with these changes. Thanks @ChuckHastings @KyleFromNVIDIA @vyasr and others for all the support with build issues.
|
/merge |
|
CUDA 12.2 tests are failing like this: This is happening because we reverted the changes that statically link libcudart. The |
libcugraph now requires a CUDA >= 12.8 cudart at runtime (via cuVS's use of cudaLibraryGetKernel), so the 12.2.2 conda/wheel test jobs fail with: undefined symbol: cudaLibraryGetKernel, version libcudart.so.12 Filter the test matrices in pr.yaml and test.yaml to CUDA >= 12.9 until a solution is implemented. Tracked in rapidsai#5498.
|
Thanks all! 🙏 |
This property, add to rmm in rapidsai/rmm#2317 and used in cugraph in #5483, is a holdover from when rmm enforced linking of dynamic cudart. rapidsai/rmm#2375 switched to static cudart and stopped querying the property, so remove it. Issue: rapidsai/build-planning#235 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Bradley Dice (https://github.com/bdice) URL: #5508
Summary
rmm::mr::device_memory_resourcebase class,owning_wrapper,shared_ptr-based resource management, and deprecated per-device resource APIs with CCCL-native memory resource typescuda::mr::any_resource<cuda::mr::device_accessible>for owning type-erased storage,rmm::device_async_resource_reffor non-owning references, and value-typed resources (cuda_memory_resource,pinned_host_memory_resource)raft::handle_tas theworkspace_resource(3rd) constructor argument, matching the new raft API (stream_view,stream_pool,std::optional<raft::mr::device_resource>)Depends on rapidsai/rmm#2361.
Depends on rapidsai/ucxx#636.
Depends on rapidsai/raft#2996.
Depends on rapidsai/cuvs#1990.
Files changed
Headers:
algorithms.hpp,dendrogram.hpp,legacy/graph.hpp,legacy/functions.hpp:get_current_device_resource()→get_current_device_resource_ref()in default argument expressionshost_staging_buffer_manager.hpp: Removeowning_wrapper, storepool_memory_resourceby value in astd::optional, acceptpinned_host_memory_resourceby value ininit()large_buffer_manager.hpp: Storepinned_host_memory_resourceby value (notshared_ptr), returndevice_async_resource_reffromget(),std::movethe resource into storagemtmg/resource_manager.hpp: Usecuda::mr::any_resource<device_accessible>instead ofshared_ptr<device_memory_resource>forper_device_rmm_resources_, use non-deprecatedset_per_device_resource, pass resource asworkspace_resourcetoraft::handle_tTests:
base_fixture.hpp: Returnany_resource<device_accessible>fromcreate_memory_resource(), use value-typed MR factory helpers (make_cuda,make_managed,make_pool,make_binning), switch to non-deprecatedset_current_device_resource/get_current_device_resource_refmulti_node_threaded_test.cpp: Switch to non-deprecatedset_current_device_resource(resource)mg_graph500_bfs_test.cu,mg_graph500_sssp_test.cu: Storepinned_mr_asoptional<pinned_host_memory_resource>by value, prefer.value()overoperator*for optional accessExamples:
sg_graph_algorithms.cpp,mg_graph_algorithms.cpp,vertex_and_edge_partition.cu,graph_operations.cu): Use value-typedcuda_memory_resource, non-deprecatedset_current_device_resource, pass the resource toraft::handle_tas theworkspace_resource(3rd positional arg, withnullptrfor the unusedstream_pool)