Skip to content

feat: add multiple_blocks_allocation RAII handle for fixed_size_memory_resource#2368

Open
nirandaperera wants to merge 6 commits into
rapidsai:mainfrom
nirandaperera:mutiple_fixed_size_allocations
Open

feat: add multiple_blocks_allocation RAII handle for fixed_size_memory_resource#2368
nirandaperera wants to merge 6 commits into
rapidsai:mainfrom
nirandaperera:mutiple_fixed_size_allocations

Conversation

@nirandaperera
Copy link
Copy Markdown
Contributor

Summary

  • Adds multiple_blocks_allocation, an RAII class that allocates device memory spanning one or more fixed-size blocks from a fixed_size_memory_resource in a single stream-ordered call.
  • Exposes multiple_blocks_allocation::make_async(mr, size, stream) as the sole public factory, which acquires the required number of blocks under a single mutex lock and records a CUDA event for correct stream ordering.
  • On destruction the held blocks are returned to the resource's free list via a new internal deallocate_blocks_async_unsafe path (caller holds the mutex) — avoiding a second lock round-trip.
  • Promotes get_event / get_block to protected in stream_ordered_memory_resource so that fixed_size_memory_resource_impl can grant friend access to multiple_blocks_allocation.
  • Non-inline method bodies live in cpp/src/mr/fixed_size_memory_resource.cpp to keep the public header lean.
  • Adds fixed_size_mr_test.cpp which exercises allocation sizes of 0, sub-block, exact-block, and multi-block across cuda_memory_resource and cuda_async_memory_resource upstreams, using statistics_resource_adaptor to verify byte counts.

Test plan

  • Build and run FIXED_SIZE_MR_TEST (all parameter combinations pass).
  • Verify no deadlocks or double-free under ASAN/TSAN.
  • Confirm existing fixed_size_mr tests still pass.

Made with Cursor

Signed-off-by: niranda perera <niranda.perera@gmail.com>
@nirandaperera nirandaperera requested review from a team as code owners April 24, 2026 01:13
@nirandaperera nirandaperera requested review from bdice and ttnghia April 24, 2026 01:13
@nirandaperera nirandaperera added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Apr 24, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a public RAII handle multiple_blocks_allocation for multi-block async allocations, introduces a private async batch deallocator and friend/forward declarations in fixed-size MR internals, makes get_event protected in stream-ordered MR, and adds tests covering async multi-block allocation/deallocation and upstream accounting.

Changes

Cohort / File(s) Summary
Public API: fixed-size MR
cpp/include/rmm/mr/fixed_size_memory_resource.hpp
Adds exported multiple_blocks_allocation RAII type with make_async, accessors (size, capacity, block_size, stream), block views (get_blocks, operator[], at), clear(), deleted copy, and move support.
Impl header plumbing
cpp/include/rmm/mr/detail/fixed_size_memory_resource_impl.hpp
Adds <memory> include, forward declarations for fixed_size_memory_resource and multiple_blocks_allocation, grants friendship to multiple_blocks_allocation, and adds a private deallocate_blocks_async_unsafe(std::vector<std::byte*>&&, cuda_stream_view) helper (caller must hold MR mutex).
Stream-ordered MR visibility
cpp/include/rmm/mr/detail/stream_ordered_memory_resource.hpp
Changes get_event(cuda_stream_view) visibility from private to protected to allow derived classes to access stream event retrieval.
Implementation: allocation & async dealloc
cpp/src/mr/detail/fixed_size_memory_resource_impl.cpp, cpp/src/mr/fixed_size_memory_resource.cpp
Implements deallocate_blocks_async_unsafe (converts freed pointers to free list, records CUDA event, inserts blocks stream-ordered) and implements multiple_blocks_allocation (factory make_async, constructors/move, clear(), destructor with async return via MR mutex and exception-to-log handling; ensures exception safety when partially allocating).
Tests & build
cpp/tests/CMakeLists.txt, cpp/tests/mr/fixed_size_mr_test.cpp
Registers a new fixed-size MR test and adds a parameterized GTest that exercises async multi-block allocations/deallocations across streams/threads and validates upstream byte accounting.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • ttnghia
  • bdice
  • harrism
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main addition: a new RAII handle class for multi-block allocation in fixed-size memory resource.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing the RAII class functionality, API design, implementation strategy, and test coverage.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
cpp/tests/mr/fixed_size_mr_test.cpp (1)

58-115: Add a multithreaded test for the new allocation/destruction path.

This only exercises single-threaded accounting. The feature change is in the mutex/event path of make_async and destruction, so a test that allocates and destroys handles from multiple CPU threads would give much better coverage of the new synchronization logic. Based on learnings "Test concurrent allocations and deallocations from multiple threads; verify thread safety of pool implementations."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mr/fixed_size_mr_test.cpp` around lines 58 - 115, The test only
exercises single-threaded allocation/deallocation of fixed_size_memory_resource
and does not cover the new mutex/event synchronization in
rmm::mr::multiple_blocks_allocation::make_async and its destruction path; add a
multithreaded variant of TEST_P(FixedSizeMRTest,
AllocateBlocksAsyncUpstreamCountedDeallocateDoesNotReturnToUpstream) that spawns
several CPU threads which concurrently call
multiple_blocks_allocation::make_async on the same fixed_size_mr instance (using
the same stream_pool or distinct streams) and then clear/destroy the handles
concurrently, asserting the same invariants (handle->size(), handle->capacity(),
counting.get_bytes_counter().value behavior) both before and after fixed_size_mr
destruction to exercise and verify the mutex/event synchronization path in
make_async and destructor.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/mr/fixed_size_memory_resource.cpp`:
- Around line 73-85: After acquiring the blocks with self.get_block(...) and
before returning the new multiple_blocks_allocation, ensure that construction
failures do not leak the acquired blocks: wrap the construction of the
unique_ptr<multiple_blocks_allocation> (the new multiple_blocks_allocation(size,
std::move(blocks), stream, std::move(mr)) call) in a try/catch that on
catch(...) calls self.deallocate_blocks_async_unsafe(std::move(blocks), stream)
and rethrows; alternatively create the object via a factory that returns a smart
pointer or use a scoped guard that deallocates blocks on exception so
deallocate_blocks_async_unsafe is always invoked if the
multiple_blocks_allocation constructor throws.
- Around line 39-55: Reject per-thread default streams when creating an
asynchronous multiple_blocks_allocation: in the factory function make_async (the
caller that constructs multiple_blocks_allocation and passes cuda_stream_view
stream), validate stream.is_per_thread_default() and fail fast (throw or return
an error) instead of accepting PTDS; multiple_blocks_allocation stores stream_
and uses it in its destructor (deallocate_blocks_async_unsafe), so accepting
PTDS would cause the stored handle to resolve on a different thread and race
with GPU work—update make_async to check stream.is_per_thread_default() and
document the rejection, or alternatively require the allocation be destroyed on
the creating thread if PTDS is allowed (preferred: reject PTDS in make_async).

---

Nitpick comments:
In `@cpp/tests/mr/fixed_size_mr_test.cpp`:
- Around line 58-115: The test only exercises single-threaded
allocation/deallocation of fixed_size_memory_resource and does not cover the new
mutex/event synchronization in rmm::mr::multiple_blocks_allocation::make_async
and its destruction path; add a multithreaded variant of TEST_P(FixedSizeMRTest,
AllocateBlocksAsyncUpstreamCountedDeallocateDoesNotReturnToUpstream) that spawns
several CPU threads which concurrently call
multiple_blocks_allocation::make_async on the same fixed_size_mr instance (using
the same stream_pool or distinct streams) and then clear/destroy the handles
concurrently, asserting the same invariants (handle->size(), handle->capacity(),
counting.get_bytes_counter().value behavior) both before and after fixed_size_mr
destruction to exercise and verify the mutex/event synchronization path in
make_async and destructor.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 65115bfb-9479-479a-b6c0-2aa3b96ca229

📥 Commits

Reviewing files that changed from the base of the PR and between 386f76d and 2ebec2b.

📒 Files selected for processing (7)
  • cpp/include/rmm/mr/detail/fixed_size_memory_resource_impl.hpp
  • cpp/include/rmm/mr/detail/stream_ordered_memory_resource.hpp
  • cpp/include/rmm/mr/fixed_size_memory_resource.hpp
  • cpp/src/mr/detail/fixed_size_memory_resource_impl.cpp
  • cpp/src/mr/fixed_size_memory_resource.cpp
  • cpp/tests/CMakeLists.txt
  • cpp/tests/mr/fixed_size_mr_test.cpp

Comment thread cpp/src/mr/fixed_size_memory_resource.cpp
Comment thread cpp/src/mr/fixed_size_memory_resource.cpp
Signed-off-by: niranda perera <niranda.perera@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/include/rmm/mr/fixed_size_memory_resource.hpp`:
- Around line 91-204: Update the class-level doc for multiple_blocks_allocation
to reference the actual factory make_async (replace the stale
allocate_blocks_async mention) and add a brief thread-safety/lifetime note
clarifying that multiple_blocks_allocation is non-copyable/non-movable, holds a
refcounted fixed_size_memory_resource (mr_) so the pool outlives the handle, and
that allocation/deallocation are stream-ordered on the cuda_stream_view passed
to make_async (including that destructor records the deallocation on that
stream). Ensure references to symbols multiple_blocks_allocation, make_async,
mr_, and the destructor are included so the API contract matches the
implementation.

In `@cpp/src/mr/fixed_size_memory_resource.cpp`:
- Around line 51-57: The destructor
multiple_blocks_allocation::~multiple_blocks_allocation currently calls
mr_->deallocate_blocks_async_unsafe(...) which can surface CUDA errors from a
destructor; make the cleanup no-throw by wrapping the deallocation call in the
project's no-throw CUDA error macro/utility (e.g., RMM_CUDA_TRY_NOEXCEPT) or an
equivalent try/catch that logs errors without throwing. Ensure you still check
blocks_.empty(), acquire the mutex via std::lock_guard as before, and move
blocks_ into the deallocation call, but perform that call inside the no-throw
wrapper so any CUDA error is swallowed/logged rather than allowed to propagate
out of the destructor.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4e775602-9c2b-4234-8235-8629762e0fa4

📥 Commits

Reviewing files that changed from the base of the PR and between 2ebec2b and 0e68a53.

📒 Files selected for processing (3)
  • cpp/include/rmm/mr/fixed_size_memory_resource.hpp
  • cpp/src/mr/fixed_size_memory_resource.cpp
  • cpp/tests/mr/fixed_size_mr_test.cpp

Comment thread cpp/include/rmm/mr/fixed_size_memory_resource.hpp
Comment thread cpp/src/mr/fixed_size_memory_resource.cpp Outdated
Comment thread cpp/include/rmm/mr/fixed_size_memory_resource.hpp Outdated
* before the failure are returned to the pool on `stream` (same ordering as normal
* deallocation).
*/
[[nodiscard]] static std::unique_ptr<multiple_blocks_allocation> make_async(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a factory instead of a normal constructor?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously inside the fixed sized mr class. Then I thought a factory method was the best. When I pulled it out, I left it as is. I felt its more idiomatic. We can throw and verify args (I should remove the RMM_EXPECTS statements in the ctr) cleanly. But I am fine either way. WDYT @bdice ?

Comment thread cpp/include/rmm/mr/fixed_size_memory_resource.hpp Outdated
Comment thread cpp/include/rmm/mr/fixed_size_memory_resource.hpp
* deallocation).
*/
[[nodiscard]] static std::unique_ptr<multiple_blocks_allocation> make_async(
fixed_size_memory_resource mr, std::size_t size, cuda_stream_view stream);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All new APIs should accept cuda::stream_ref rather than rmm::cuda_stream_view. I am planning to deprecate rmm::cuda_stream_view at some point. Until then the two types are implicitly convertible and thus interchangeable.

Comment thread cpp/tests/mr/fixed_size_mr_test.cpp Outdated
Signed-off-by: niranda perera <niranda.perera@gmail.com>
Signed-off-by: niranda perera <niranda.perera@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
cpp/src/mr/fixed_size_memory_resource.cpp (2)

108-121: ⚠️ Potential issue | 🟠 Major

Keep handle construction inside the rollback scope.

If new multiple_blocks_allocation(...) throws, control skips the current catch and the acquired blocks never get returned to the pool. The rollback needs to cover object construction as well, not just get_block().

🛠️ Proposed fix
   std::vector<std::byte*> blocks;
   blocks.reserve(num_blocks);
   try {
     for (std::size_t i = 0; i < num_blocks; ++i) {
       blocks.push_back(
         static_cast<std::byte*>(self.get_block(self.get_block_size(), stream_event).pointer()));
     }
-  } catch (...) {
-    RMM_CUDA_TRY(self.deallocate_blocks_async_unsafe(std::move(blocks), stream));
-    throw;
-  }
-
-  return std::unique_ptr<multiple_blocks_allocation>(
-    new multiple_blocks_allocation(size, std::move(blocks), stream, std::move(mr)));
+    return std::unique_ptr<multiple_blocks_allocation>(
+      new multiple_blocks_allocation(size, std::move(blocks), stream, std::move(mr)));
+  } catch (...) {
+    RMM_CUDA_TRY(self.deallocate_blocks_async_unsafe(std::move(blocks), stream));
+    throw;
+  }
#!/bin/bash
# Verify that the handle construction still sits outside the rollback scope.
sed -n '103,122p' cpp/src/mr/fixed_size_memory_resource.cpp

Based on learnings: "Detect and fix GPU memory leaks: ensure all device allocations (cudaMalloc, cudaMallocManaged, cudaHostAlloc) are properly deallocated, including error paths."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mr/fixed_size_memory_resource.cpp` around lines 108 - 121, The
construction of the multiple_blocks_allocation handle can throw and currently
lives outside the rollback scope, so if its constructor throws the acquired
blocks in `blocks` won't be returned; move the creation of the unique_ptr/new
multiple_blocks_allocation(...) inside the same try that obtains blocks (or add
a nested try/catch around the `new multiple_blocks_allocation(...)` that calls
`self.deallocate_blocks_async_unsafe(std::move(blocks), stream)` and rethrows)
so that any exception during `multiple_blocks_allocation` construction triggers
the same rollback path; reference `get_block`, `blocks`,
`deallocate_blocks_async_unsafe`, and `multiple_blocks_allocation` when making
the change.

49-56: ⚠️ Potential issue | 🔴 Critical

Use std::exchange() to ensure moved-from state is safely emptied.

After move-construction or move-assignment, std::vector is left in a valid but unspecified state—not guaranteed empty. This creates a double-release risk: if the moved-from object later calls clear() or its destructor, GPU blocks can be returned twice. Additionally, deallocate_blocks_async_unsafe() is called with a moved-from vector in unspecified state.

Also, in make_async() (lines 115-121), the new multiple_blocks_allocation(...) call is outside the try/catch block. If that allocation throws, the acquired blocks are not deallocated, leaking GPU memory. Wrap the object construction in the rollback scope.

🔧 Proposed fix
 void multiple_blocks_allocation::clear()
 {
-  if (!blocks_.empty()) {
+  auto blocks = std::exchange(blocks_, {});
+  if (!blocks.empty()) {
     std::lock_guard<std::mutex> lock(mr_->get_mutex());
-    RMM_CUDA_TRY(mr_->deallocate_blocks_async_unsafe(std::move(blocks_), stream_));
+    RMM_CUDA_TRY(mr_->deallocate_blocks_async_unsafe(std::move(blocks), stream_));
   }
   size_ = 0;
 }
 
 multiple_blocks_allocation::multiple_blocks_allocation(multiple_blocks_allocation&& other) noexcept
-  : blocks_(std::move(other.blocks_)),
+  : blocks_(std::exchange(other.blocks_, {})),
     size_(other.size_),
     stream_(other.stream_),
     mr_(std::move(other.mr_))
 {
   other.size_ = 0;
 }
 
 multiple_blocks_allocation& multiple_blocks_allocation::operator=(
   multiple_blocks_allocation&& other)
 {
   if (this != &other) {
     clear();
-    blocks_     = std::move(other.blocks_);
+    blocks_     = std::exchange(other.blocks_, {});
     size_       = other.size_;
     stream_     = other.stream_;
     mr_         = std::move(other.mr_);
     other.size_ = 0;
   }
   return *this;
 }
 
   try {
     for (std::size_t i = 0; i < num_blocks; ++i) {
       blocks.push_back(...);
     }
-  } catch (...) {
+    return std::unique_ptr<multiple_blocks_allocation>(
+      new multiple_blocks_allocation(size, std::move(blocks), stream, std::move(mr)));
+  } catch (...) {
     RMM_CUDA_TRY(self.deallocate_blocks_async_unsafe(std::move(blocks), stream));
     throw;
   }
-
-  return std::unique_ptr<multiple_blocks_allocation>(
-    new multiple_blocks_allocation(size, std::move(blocks), stream, std::move(mr)));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mr/fixed_size_memory_resource.cpp` around lines 49 - 56, The move
constructor multiple_blocks_allocation(multiple_blocks_allocation&&) leaves
moved-from members in unspecified state which risks double-release; use
std::exchange to move blocks_ and mr_ and set size_ and stream_ to safe defaults
(e.g., std::exchange(other.size_, 0), std::exchange(other.stream_, nullptr),
std::exchange(other.blocks_, {}), std::exchange(other.mr_, nullptr)) so the
moved-from object is explicitly empty. Also, in make_async() wrap the new
multiple_blocks_allocation(...) construction inside the existing
try/catch/rollback scope (or add a try around it) so that if operator new throws
you call deallocate_blocks_async_unsafe() to release acquired GPU blocks before
rethrowing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cpp/src/mr/fixed_size_memory_resource.cpp`:
- Around line 108-121: The construction of the multiple_blocks_allocation handle
can throw and currently lives outside the rollback scope, so if its constructor
throws the acquired blocks in `blocks` won't be returned; move the creation of
the unique_ptr/new multiple_blocks_allocation(...) inside the same try that
obtains blocks (or add a nested try/catch around the `new
multiple_blocks_allocation(...)` that calls
`self.deallocate_blocks_async_unsafe(std::move(blocks), stream)` and rethrows)
so that any exception during `multiple_blocks_allocation` construction triggers
the same rollback path; reference `get_block`, `blocks`,
`deallocate_blocks_async_unsafe`, and `multiple_blocks_allocation` when making
the change.
- Around line 49-56: The move constructor
multiple_blocks_allocation(multiple_blocks_allocation&&) leaves moved-from
members in unspecified state which risks double-release; use std::exchange to
move blocks_ and mr_ and set size_ and stream_ to safe defaults (e.g.,
std::exchange(other.size_, 0), std::exchange(other.stream_, nullptr),
std::exchange(other.blocks_, {}), std::exchange(other.mr_, nullptr)) so the
moved-from object is explicitly empty. Also, in make_async() wrap the new
multiple_blocks_allocation(...) construction inside the existing
try/catch/rollback scope (or add a try around it) so that if operator new throws
you call deallocate_blocks_async_unsafe() to release acquired GPU blocks before
rethrowing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a6e7e9e1-54d3-4ffd-b1b9-0ae84ba5448e

📥 Commits

Reviewing files that changed from the base of the PR and between 0e68a53 and c754376.

📒 Files selected for processing (4)
  • cpp/include/rmm/mr/detail/fixed_size_memory_resource_impl.hpp
  • cpp/include/rmm/mr/fixed_size_memory_resource.hpp
  • cpp/src/mr/detail/fixed_size_memory_resource_impl.cpp
  • cpp/src/mr/fixed_size_memory_resource.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • cpp/include/rmm/mr/fixed_size_memory_resource.hpp
  • cpp/src/mr/detail/fixed_size_memory_resource_impl.cpp

Signed-off-by: niranda perera <niranda.perera@gmail.com>
@nirandaperera nirandaperera requested review from bdice and wence- April 28, 2026 23:47
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
cpp/src/mr/fixed_size_memory_resource.cpp (1)

111-122: ⚠️ Potential issue | 🟠 Major

Extend the rollback path to cover handle construction.

Line 121 is still outside the rollback try. If new multiple_blocks_allocation(...) throws after the blocks were acquired, those blocks never get returned to the pool and pool capacity leaks on the failure path.

🛠️ Proposed fix
   std::vector<std::byte*> blocks;
   blocks.reserve(num_blocks);
   try {
     for (std::size_t i = 0; i < num_blocks; ++i) {
       blocks.push_back(
         static_cast<std::byte*>(self.get_block(self.get_block_size(), stream_event).pointer()));
     }
-  } catch (...) {
-    RMM_CUDA_TRY(self.deallocate_blocks_async_unsafe(std::move(blocks), stream));
-    throw;
-  }
-
-  return std::unique_ptr<multiple_blocks_allocation>(
-    new multiple_blocks_allocation(size, std::move(blocks), stream, std::move(mr)));
+    return std::unique_ptr<multiple_blocks_allocation>(
+      new multiple_blocks_allocation(size, std::move(blocks), stream, std::move(mr)));
+  } catch (...) {
+    RMM_CUDA_TRY(self.deallocate_blocks_async_unsafe(std::move(blocks), stream));
+    throw;
+  }

As per coding guidelines, "Detect and fix GPU memory leaks: ensure all device allocations (cudaMalloc, cudaMallocManaged, cudaHostAlloc) are properly deallocated, including error paths."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mr/fixed_size_memory_resource.cpp` around lines 111 - 122, The
allocation of blocks is protected by a try/catch but the subsequent construction
of multiple_blocks_allocation(new multiple_blocks_allocation(...)) sits outside
that try, so if the constructor throws the acquired blocks are never returned;
move the creation of the multiple_blocks_allocation object inside the same try
(or wrap the constructor call in the same try) and, in the catch, call
self.deallocate_blocks_async_unsafe(std::move(blocks), stream) (using
RMM_CUDA_TRY as currently done) to ensure all blocks acquired via
get_block/get_block_size are returned on any failure; reference the blocks
vector, multiple_blocks_allocation constructor, get_block/get_block_size,
deallocate_blocks_async_unsafe, stream_event, stream and mr when making the
change.
cpp/include/rmm/mr/fixed_size_memory_resource.hpp (1)

91-99: ⚠️ Potential issue | 🟡 Minor

Document the thread-safety contract explicitly.

The public docs explain lifetime, but they still do not say whether distinct handles may be cleared/destroyed concurrently or whether a single handle requires external synchronization. Please make that contract explicit in the class/destructor docs.

As per coding guidelines, "Document thread-safety guarantees in memory resource class documentation (Doxygen); specify which operations are thread-safe."

Also applies to: 126-132

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/rmm/mr/fixed_size_memory_resource.hpp` around lines 91 - 99,
Update the Doxygen for the RAII handle and the fixed_size_memory_resource to
state the thread-safety contract: specify whether distinct handle instances (the
RAII allocation handle) may be destroyed/cleared concurrently without external
synchronization and whether operations on a single handle (including its
destructor and any clear/release method) are safe only with external
synchronization; also document that the underlying fixed_size_memory_resource
permits concurrent allocations/frees across different streams if applicable or
requires external synchronization. Reference the RAII handle class/destructor
and fixed_size_memory_resource types in the comments so readers can find the
guarantees quickly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/tests/mr/fixed_size_mr_test.cpp`:
- Around line 141-151: The deallocation loop serializes teardown because
handles.pop_back() destroys the unique_ptr while holding handles_mutex; change
the loop inside the async task so it pops/moves the unique_ptr out while holding
handles_mutex (e.g., move handles.back() into a local variable and pop_back),
then release the lock and let the unique_ptr go out of scope (destroy) outside
the critical section; update the lambda used by dealloc_futs to use
handles_mutex, move the handle to a local variable before unlocking, and perform
destruction (letting multiple_blocks_allocation teardown run) outside the mutex
to allow true concurrent deallocation.

---

Duplicate comments:
In `@cpp/include/rmm/mr/fixed_size_memory_resource.hpp`:
- Around line 91-99: Update the Doxygen for the RAII handle and the
fixed_size_memory_resource to state the thread-safety contract: specify whether
distinct handle instances (the RAII allocation handle) may be destroyed/cleared
concurrently without external synchronization and whether operations on a single
handle (including its destructor and any clear/release method) are safe only
with external synchronization; also document that the underlying
fixed_size_memory_resource permits concurrent allocations/frees across different
streams if applicable or requires external synchronization. Reference the RAII
handle class/destructor and fixed_size_memory_resource types in the comments so
readers can find the guarantees quickly.

In `@cpp/src/mr/fixed_size_memory_resource.cpp`:
- Around line 111-122: The allocation of blocks is protected by a try/catch but
the subsequent construction of multiple_blocks_allocation(new
multiple_blocks_allocation(...)) sits outside that try, so if the constructor
throws the acquired blocks are never returned; move the creation of the
multiple_blocks_allocation object inside the same try (or wrap the constructor
call in the same try) and, in the catch, call
self.deallocate_blocks_async_unsafe(std::move(blocks), stream) (using
RMM_CUDA_TRY as currently done) to ensure all blocks acquired via
get_block/get_block_size are returned on any failure; reference the blocks
vector, multiple_blocks_allocation constructor, get_block/get_block_size,
deallocate_blocks_async_unsafe, stream_event, stream and mr when making the
change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 359d9832-10c9-4ee9-950d-787128b95b9c

📥 Commits

Reviewing files that changed from the base of the PR and between a5e3f55 and 4333c99.

📒 Files selected for processing (3)
  • cpp/include/rmm/mr/fixed_size_memory_resource.hpp
  • cpp/src/mr/fixed_size_memory_resource.cpp
  • cpp/tests/mr/fixed_size_mr_test.cpp

Comment thread cpp/tests/mr/fixed_size_mr_test.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants