Skip to content

Honor requested aligned adaptor alignment#2396

Open
bdice wants to merge 2 commits into
rapidsai:mainfrom
bdice:fix-aligned-adaptor-request-alignment
Open

Honor requested aligned adaptor alignment#2396
bdice wants to merge 2 commits into
rapidsai:mainfrom
bdice:fix-aligned-adaptor-request-alignment

Conversation

@bdice
Copy link
Copy Markdown
Collaborator

@bdice bdice commented May 17, 2026

Description

aligned_resource_adaptor previously ignored the per-call alignment passed to allocate() and deallocate(), using only the adaptor's configured alignment. This meant an adaptor configured with the default alignment could fail to satisfy an over-aligned request from a caller.

This PR computes an effective alignment from the caller request, the configured adaptor alignment, and CUDA_ALLOCATION_ALIGNMENT, then uses it consistently for upstream allocation sizing, returned pointer alignment, and deallocation sizing. The allocation path also validates that the caller-requested alignment is supported before using it in alignment math.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@bdice bdice requested a review from a team as a code owner May 17, 2026 17:55
@bdice bdice requested review from harrism and shrshi May 17, 2026 17:55
@bdice bdice added bug Something isn't working non-breaking Non-breaking change labels May 17, 2026
@bdice bdice self-assigned this May 17, 2026
@bdice bdice moved this to In Progress in RMM Project Board May 17, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 17, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Memory allocation and deallocation now properly respect caller-requested alignment parameters, improving pointer alignment correctness.
  • Tests

    • Added validation tests for invalid alignment requests.
    • Added tests verifying proper alignment of returned pointers when custom alignment is requested.

Walkthrough

The aligned_resource_adaptor now respects caller-provided alignment parameters in both allocation and deallocation. Implementation helpers compute effective alignment based on requested alignment, configured alignment, and byte thresholds. Tests validate invalid alignment rejection and confirm caller-requested alignment takes precedence.

Changes

Caller-requested alignment support

Layer / File(s) Summary
Alignment computation helpers and header cleanup
cpp/include/rmm/mr/detail/aligned_resource_adaptor_impl.hpp, cpp/src/mr/detail/aligned_resource_adaptor_impl.cpp (lines 17–36)
Removed the per-instance upstream_allocation_size method declaration from the header. Added two namespace-scoped helper functions: compute_effective_alignment() combines requested alignment, configured alignment, CUDA alignment, and threshold condition; compute_upstream_allocation_size() derives the allocation size for a target alignment.
Allocate with caller-requested alignment
cpp/src/mr/detail/aligned_resource_adaptor_impl.cpp (lines 62–76)
Updated allocate() to receive and use the alignment parameter. Computes effective alignment via helper, then branches: when adjustment needed, over-allocates at computed upstream size and returns a pointer aligned to effective alignment while tracking mappings.
Deallocate with caller-requested alignment
cpp/src/mr/detail/aligned_resource_adaptor_impl.cpp (lines 90–94, 105)
Updated deallocate() to receive and use the alignment parameter. Computes effective alignment in both fast and slow paths to determine correct upstream deallocation size, replacing previous per-instance alignment logic.
Alignment validation and behavior tests
cpp/tests/mr/aligned_mr_tests.cpp (lines 91–99, 209–256)
Added three test cases: validation that invalid requested alignment throws rmm::logic_error; verification that allocate returns caller-aligned addresses; confirmation that explicit caller alignment overrides configured threshold alignment.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • rapidsai/rmm#2343: Updates CCCL adaptor and test call sites to pass explicit rmm::CUDA_ALLOCATION_ALIGNMENT into allocate*/deallocate*, which complements this PR's changes to actually honor the alignment parameter.
  • rapidsai/rmm#2330: Adds device_buffer(..., alignment, ...) constructors that pass alignment through to _mr.allocate(..., alignment), which directly depends on this PR's alignment parameter support.

Suggested labels

bug, non-breaking

Suggested reviewers

  • harrism
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Honor requested aligned adaptor alignment' directly describes the main change: making the aligned_resource_adaptor respect per-call alignment parameters instead of ignoring them.
Description check ✅ Passed The description is well-related to the changeset, explaining the bug fix, the solution approach, and confirming tests and documentation are updated.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/src/mr/detail/aligned_resource_adaptor_impl.cpp (1)

92-105: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Track upstream allocation metadata instead of recomputing it in deallocate().

deallocate() now trusts the caller-provided alignment to decide whether the returned pointer was remapped. For example, allocate(..., 1024, 4096) records an adjusted pointer in pointers_, but deallocate(..., adjusted_ptr, 1024, rmm::CUDA_ALLOCATION_ALIGNMENT) will hit the Line 94 fast path and forward the adjusted pointer and size 1024 upstream without consulting that map. Before this PR the free-side alignment was ignored, so this change turns that mismatch into a bad upstream free. Please recover the original pointer and upstream size from tracked allocation metadata rather than recomputing both from the deallocation arguments, and add a regression test for that case.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/mr/detail/aligned_resource_adaptor_impl.cpp` around lines 92 - 105,
The deallocate path currently recomputes upstream pointer/size from the
caller-provided alignment and can forward wrong values when the caller passes a
different alignment than used at allocation; change deallocate in
aligned_resource_adaptor_impl.cpp to lookup tracked allocation metadata instead
of recomputing: when allocate stores remapped pointers_ (and the
upstream_allocation_size) store a small struct (orig_upstream_ptr and
upstream_size) keyed by the adjusted ptr, and in deallocate (function
deallocate) first check pointers_/metadata map to retrieve the original upstream
pointer and exact upstream_size to pass to upstream_mr_.deallocate rather than
calling upstream_allocation_size or relying on effective_alignment; ensure you
erase the metadata entry under mtx_ and add a regression test that calls
allocate(..., alignment=X, requested_alignment=Y) then deallocate with the
opposite alignment to exercise the lookup path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@cpp/src/mr/detail/aligned_resource_adaptor_impl.cpp`:
- Around line 92-105: The deallocate path currently recomputes upstream
pointer/size from the caller-provided alignment and can forward wrong values
when the caller passes a different alignment than used at allocation; change
deallocate in aligned_resource_adaptor_impl.cpp to lookup tracked allocation
metadata instead of recomputing: when allocate stores remapped pointers_ (and
the upstream_allocation_size) store a small struct (orig_upstream_ptr and
upstream_size) keyed by the adjusted ptr, and in deallocate (function
deallocate) first check pointers_/metadata map to retrieve the original upstream
pointer and exact upstream_size to pass to upstream_mr_.deallocate rather than
calling upstream_allocation_size or relying on effective_alignment; ensure you
erase the metadata entry under mtx_ and add a regression test that calls
allocate(..., alignment=X, requested_alignment=Y) then deallocate with the
opposite alignment to exercise the lookup path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 81037943-c66b-4f44-847b-6a67a5ed8a66

📥 Commits

Reviewing files that changed from the base of the PR and between b187020 and f996e36.

📒 Files selected for processing (3)
  • cpp/include/rmm/mr/detail/aligned_resource_adaptor_impl.hpp
  • cpp/src/mr/detail/aligned_resource_adaptor_impl.cpp
  • cpp/tests/mr/aligned_mr_tests.cpp
💤 Files with no reviewable changes (1)
  • cpp/include/rmm/mr/detail/aligned_resource_adaptor_impl.hpp

Comment on lines +69 to 70
if (bytes == 0 || effective_align == rmm::CUDA_ALLOCATION_ALIGNMENT) {
return upstream_mr_.allocate(stream, bytes, 1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this implicitly encodes that every RMM memory resource must provide allocations that are CUDA_ALLOCATION_ALIGNMENT aligned. Should we (at least in debug mode) assert that?

Imagine I am a writing my own resource, and I forget about this restriction.

I think we should also push (again) for the "related" request in this cccl issue NVIDIA/cccl#8157

We can at least implement said properties on all RMM resources today, I think.

return aligned_size + (alignment - rmm::CUDA_ALLOCATION_ALIGNMENT);
}

[[nodiscard]] std::size_t effective_alignment(std::size_t bytes,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[[nodiscard]] std::size_t effective_alignment(std::size_t bytes,
[[nodiscard]] constexpr std::size_t effective_alignment(std::size_t bytes,

namespace detail {
namespace {

[[nodiscard]] std::size_t upstream_allocation_size(std::size_t bytes, std::size_t alignment)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[[nodiscard]] std::size_t upstream_allocation_size(std::size_t bytes, std::size_t alignment)
[[nodiscard]] constexpr std::size_t upstream_allocation_size(std::size_t bytes, std::size_t alignment) noexcept

(Although align_up is not constexpr because in NDEBUG mode it asserts)

Comment on lines +225 to +228
void* const expected_pointer = int_to_address(4096);
auto const size{1024};
EXPECT_EQ(mr.allocate(stream, size, alignment), expected_pointer);
mr.deallocate(stream, expected_pointer, size, alignment);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how gtest's mock methods work. But I would have thought that the thing to test here is:

void *ptr = mr.allocate(...);
EXPECT_EQ(static_cast<std::uintptr_t>(ptr) % alignment, 0);
...

Comment on lines +250 to +253
void* const expected_pointer = int_to_address(8192);
auto const size{1024};
EXPECT_EQ(mr.allocate(stream, size, requested_alignment), expected_pointer);
mr.deallocate(stream, expected_pointer, size, requested_alignment);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again here.

@github-project-automation github-project-automation Bot moved this from In Progress to Review in RMM Project Board May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Non-breaking change

Projects

Status: Review

Development

Successfully merging this pull request may close these issues.

2 participants