Skip to content

Make tracking resource outstanding allocations access thread-safe #2395

@bdice

Description

@bdice

Background

CodeRabbit flagged that tracking_resource_adaptor::get_outstanding_allocations() returns a const& to internal mutable state:

#2361 (comment)

The implementation stores outstanding allocations in tracking_resource_adaptor_impl::allocations_, which is updated under mtx_ during allocation and deallocation. Returning a reference lets callers observe the map after the lock has been released, so concurrent allocation/deallocation can race with readers.

Options

Option 1: Document Caller Synchronization

Document that get_outstanding_allocations() returns a view of internal tracking state and is only safe when callers externally synchronize against concurrent allocation/deallocation through the same tracking resource.

This is the smallest compatibility-preserving option. It keeps the existing return type and avoids exposing or copying internal stack trace ownership semantics.

Option 2: Change get_outstanding_allocations() to Return a Snapshot

The existing method would return std::map<void*, allocation_info> by value. This likely requires:

  • Updating the public and impl return types from std::map<void*, allocation_info> const& to std::map<void*, allocation_info>.
  • Removing noexcept from those methods because copying the map can allocate.
  • Making allocation_info copyable or changing what it contains.
    • allocation_info owns std::unique_ptr<rmm::detail::stack_trace>. Making allocation_info copyable would require deciding whether and how stack traces should be copied/exposed through this public API. That may be more API surface than we want to take on.
  • Updating tests to verify the returned snapshot remains stable after subsequent allocations/deallocations.

This is the strongest semantic fix, but it is likely the broadest source/ABI change and may expose implementation details that should remain internal.

Option 3: Add a Snapshot API

Add a new API that returns a thread-safe snapshot while holding the existing read lock. Existing behavior of get_outstanding_allocations() is unchanged.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions