Honor requested aligned adaptor alignment#2396
Conversation
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughThe ChangesCaller-requested alignment support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/src/mr/detail/aligned_resource_adaptor_impl.cpp (1)
92-105:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftTrack upstream allocation metadata instead of recomputing it in
deallocate().
deallocate()now trusts the caller-providedalignmentto decide whether the returned pointer was remapped. For example,allocate(..., 1024, 4096)records an adjusted pointer inpointers_, butdeallocate(..., adjusted_ptr, 1024, rmm::CUDA_ALLOCATION_ALIGNMENT)will hit the Line 94 fast path and forward the adjusted pointer and size1024upstream without consulting that map. Before this PR the free-side alignment was ignored, so this change turns that mismatch into a bad upstream free. Please recover the original pointer and upstream size from tracked allocation metadata rather than recomputing both from the deallocation arguments, and add a regression test for that case.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/mr/detail/aligned_resource_adaptor_impl.cpp` around lines 92 - 105, The deallocate path currently recomputes upstream pointer/size from the caller-provided alignment and can forward wrong values when the caller passes a different alignment than used at allocation; change deallocate in aligned_resource_adaptor_impl.cpp to lookup tracked allocation metadata instead of recomputing: when allocate stores remapped pointers_ (and the upstream_allocation_size) store a small struct (orig_upstream_ptr and upstream_size) keyed by the adjusted ptr, and in deallocate (function deallocate) first check pointers_/metadata map to retrieve the original upstream pointer and exact upstream_size to pass to upstream_mr_.deallocate rather than calling upstream_allocation_size or relying on effective_alignment; ensure you erase the metadata entry under mtx_ and add a regression test that calls allocate(..., alignment=X, requested_alignment=Y) then deallocate with the opposite alignment to exercise the lookup path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@cpp/src/mr/detail/aligned_resource_adaptor_impl.cpp`:
- Around line 92-105: The deallocate path currently recomputes upstream
pointer/size from the caller-provided alignment and can forward wrong values
when the caller passes a different alignment than used at allocation; change
deallocate in aligned_resource_adaptor_impl.cpp to lookup tracked allocation
metadata instead of recomputing: when allocate stores remapped pointers_ (and
the upstream_allocation_size) store a small struct (orig_upstream_ptr and
upstream_size) keyed by the adjusted ptr, and in deallocate (function
deallocate) first check pointers_/metadata map to retrieve the original upstream
pointer and exact upstream_size to pass to upstream_mr_.deallocate rather than
calling upstream_allocation_size or relying on effective_alignment; ensure you
erase the metadata entry under mtx_ and add a regression test that calls
allocate(..., alignment=X, requested_alignment=Y) then deallocate with the
opposite alignment to exercise the lookup path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 81037943-c66b-4f44-847b-6a67a5ed8a66
📒 Files selected for processing (3)
cpp/include/rmm/mr/detail/aligned_resource_adaptor_impl.hppcpp/src/mr/detail/aligned_resource_adaptor_impl.cppcpp/tests/mr/aligned_mr_tests.cpp
💤 Files with no reviewable changes (1)
- cpp/include/rmm/mr/detail/aligned_resource_adaptor_impl.hpp
| if (bytes == 0 || effective_align == rmm::CUDA_ALLOCATION_ALIGNMENT) { | ||
| return upstream_mr_.allocate(stream, bytes, 1); |
There was a problem hiding this comment.
note: this implicitly encodes that every RMM memory resource must provide allocations that are CUDA_ALLOCATION_ALIGNMENT aligned. Should we (at least in debug mode) assert that?
Imagine I am a writing my own resource, and I forget about this restriction.
I think we should also push (again) for the "related" request in this cccl issue NVIDIA/cccl#8157
We can at least implement said properties on all RMM resources today, I think.
| return aligned_size + (alignment - rmm::CUDA_ALLOCATION_ALIGNMENT); | ||
| } | ||
|
|
||
| [[nodiscard]] std::size_t effective_alignment(std::size_t bytes, |
There was a problem hiding this comment.
| [[nodiscard]] std::size_t effective_alignment(std::size_t bytes, | |
| [[nodiscard]] constexpr std::size_t effective_alignment(std::size_t bytes, |
| namespace detail { | ||
| namespace { | ||
|
|
||
| [[nodiscard]] std::size_t upstream_allocation_size(std::size_t bytes, std::size_t alignment) |
There was a problem hiding this comment.
| [[nodiscard]] std::size_t upstream_allocation_size(std::size_t bytes, std::size_t alignment) | |
| [[nodiscard]] constexpr std::size_t upstream_allocation_size(std::size_t bytes, std::size_t alignment) noexcept |
(Although align_up is not constexpr because in NDEBUG mode it asserts)
| void* const expected_pointer = int_to_address(4096); | ||
| auto const size{1024}; | ||
| EXPECT_EQ(mr.allocate(stream, size, alignment), expected_pointer); | ||
| mr.deallocate(stream, expected_pointer, size, alignment); |
There was a problem hiding this comment.
I don't understand how gtest's mock methods work. But I would have thought that the thing to test here is:
void *ptr = mr.allocate(...);
EXPECT_EQ(static_cast<std::uintptr_t>(ptr) % alignment, 0);
...
| void* const expected_pointer = int_to_address(8192); | ||
| auto const size{1024}; | ||
| EXPECT_EQ(mr.allocate(stream, size, requested_alignment), expected_pointer); | ||
| mr.deallocate(stream, expected_pointer, size, requested_alignment); |
Description
aligned_resource_adaptorpreviously ignored the per-call alignment passed toallocate()anddeallocate(), using only the adaptor's configured alignment. This meant an adaptor configured with the default alignment could fail to satisfy an over-aligned request from a caller.This PR computes an effective alignment from the caller request, the configured adaptor alignment, and
CUDA_ALLOCATION_ALIGNMENT, then uses it consistently for upstream allocation sizing, returned pointer alignment, and deallocation sizing. The allocation path also validates that the caller-requested alignment is supported before using it in alignment math.Checklist