fix: prevent deadlock in donNotifier.Subscribe() on concurrent NotifyDonSet#21963
fix: prevent deadlock in donNotifier.Subscribe() on concurrent NotifyDonSet#21963
Conversation
…DonSet Subscribe() used a blocking channel send to deliver the cached DON value. If NotifyDonSet() concurrently filled the buffer between subscriber registration and the send, Subscribe() would deadlock. Switch to a non-blocking select, matching NotifyDonSet()'s pattern. The subscriber already has the value from NotifyDonSet if the buffer is full. Fixes: CORE-2378
|
👋 Fletch153, thanks for creating this pull request! To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team. Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks! |
|
I see you updated files related to
|
|
✅ No conflicts with other open PRs targeting |
|
|
/rerun |
…DonSet Subscribe() registered the subscriber channel in the map BEFORE sending the cached DON value. If NotifyDonSet fired between registration and the send, it would fill the 1-buffer, and the subsequent blocking send in Subscribe() would deadlock. Reorder: send the cached value first (safe — channel is new, buffer is empty, nobody else has a reference), then register. Eliminates the race window entirely without changing the blocking send contract. Fixes: CORE-2378
There was a problem hiding this comment.
This change is not as described. Does it actually fix the issue?
Register the subscriber first (preserving no-missed-notification guarantee), but use a non-blocking select when sending the cached value. If a concurrent NotifyDonSet already filled the buffer, the cached send is safely skipped since the subscriber already has a notification. Also fixes a minor double-Load race in the original code by capturing the pointer once.
|
|
Closing issue - required further investigation |




Summary
Fixes a deadlock in
Subscribe()that occurs whenNotifyDonSet()fires concurrently.The channel returned by
Subscribe()has a buffer of 1. In the original code,Subscribe()registered the subscriber in the map before sending the cached DON value with a blocking send. IfNotifyDonSet()ran between those two steps, it would fill the buffer via its non-blockingselectsend, and then the blockings <- *n.don.Load()inSubscribe()would block forever — the caller has not received the channel yet, so nobody is reading from it.Root Cause
Race window in the original
Subscribe():n.subscribers.Store(s, struct{}{})— subscriber visible toNotifyDonSetNotifyDonSet()runs concurrently: stores new DON, iterates subscribers, non-blocking send fills the 1-capacity buffers <- *n.don.Load()— blocking send on a now-full buffer, no reader exists yet — deadlockNote:
NotifyDonSet()itself uses a non-blockingselectand cannot deadlock. The deadlock is inSubscribe()'s own blocking send, not inNotifyDonSet().Fix
Change the cached-value send in
Subscribe()to a non-blockingselect, matching the pattern already used inNotifyDonSet(). Registration stays first to guarantee no missed notifications. If a concurrentNotifyDonSetalready filled the buffer, the cached send is safely skipped — the subscriber already has a value queued.Test plan
TestDonNotifierpasses with-count=100 -race#bugfix
Fixes: CORE-2378