Skip to content

Enabling concurrent spilling#1048

Open
nirandaperera wants to merge 1 commit into
rapidsai:mainfrom
nirandaperera:concurrent-spilling
Open

Enabling concurrent spilling#1048
nirandaperera wants to merge 1 commit into
rapidsai:mainfrom
nirandaperera:concurrent-spilling

Conversation

@nirandaperera
Copy link
Copy Markdown
Contributor

@nirandaperera nirandaperera commented May 16, 2026

[C++] Allow concurrent SpillManager::spill() invocations

SpillManager serialized all spill() calls behind a single std::mutex, even though spill work is long-running (D2H copies, I/O) and only the rarely-touched function registry actually needed mutual exclusion. This switches the lock to a std::shared_mutex so spill() takes a shared lock and runs in parallel, while add_spill_function / remove_spill_function still take an exclusive lock — which preserves the drain-on-remove guarantee that callers like Shuffler rely on for safe teardown.

  • spill() now takes a shared lock; concurrent callers no longer serialize on the registry mutex.
  • add_spill_function / remove_spill_function take an exclusive lock, so in-flight spillers are drained before either returns.
  • Added SpillManager.ConcurrentSpill test that uses a rendezvous to prove spill() calls actually overlap (it would fail on the old exclusive mutex).
  • Audited the three registered C++ spill functions (Shuffler::spill, AllGather::spill, spill_messages); all already lock their own state for concurrent self-invocation, so no caller-side changes are required.

Signed-off-by: niranda perera <niranda.perera@gmail.com>
@nirandaperera nirandaperera requested a review from a team as a code owner May 16, 2026 01:37
@nirandaperera nirandaperera added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant