FdbOrch: use DedupQueue for FDB notifications#4603
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
A storm of MAC moves can bombard the orchagent at a rate higher than it can process, leaving stale duplicate events queued. Inject DedupQueue into the FDB NotificationConsumer so only the latest unique events are delivered to FdbOrch. Signed-off-by: manish1 <manish1@arista.com>
5fc499d to
294ed68
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| m_fdbNotificationConsumer = new swss::NotificationConsumer( | ||
| m_notificationsDb.get(), "NOTIFICATIONS", | ||
| 100, DEFAULT_NC_POP_BATCH_SIZE, | ||
| std::make_shared<swss::NotificationConsumer::DedupQueue>()); |
There was a problem hiding this comment.
Here is the list of PRs raised by @senthil-nexthop for Dedup/Fifo
sonic-net/sonic-swss-common#1191
#4586
sonic-net/sonic-sairedis#1899
sonic-net/sonic-mgmt#24848
|
Our FDB Notification consumer queue Dedup approach targeted fdborch specifically, reducing redundant notifications at a single consumer. The Other PRs (sonic-net/sonic-swss-common#1191, #4586 and sonic-net/sonic-sairedis#1899) fixes this memory growth for other orch consumer queue also by addressing the problem at the infrastructure layer rather than a single orch, by adding allowlist for each orchagent and Dedup queues for multiple orchangents. Closing this PR as these other PRs had merged |
What I did
Updated
FdbOrchto construct its FDBNotificationConsumerwith aDedupQueueinstead of the default queue. Updated the P4Orch test fake (fake_notificationconsumer.cpp) to match the new constructor signature.Why I did it
A storm of MAC moves can bombard the orchagent's Rx queue at a rate higher than it can process. With a plain FIFO every duplicate FDB event is queued and processed individually, amplifying the backlog and delaying convergence. Using DedupQueue on this channel keeps only the latest occurrence of each FDB event, dropping intermediate duplicates while preserving correctness.
How I verified it
Ran a MAC move storm against orchagent on a test setup and tracked the FDB notification queue depth and orchagent RSS over time.
Before
Under sustained MAC storm the FDB notification queue size grew unbounded, with orchagent unable to drain it as fast as events arrived.
After
With DedupQueue: the queue size stayed constant / trended down even while the storm continued, since duplicate FDB events collapse into a single latest entry before reaching the orch processing path.
Details if related
This PR is dependent on sonic-net/sonic-swss-common#1198 being merged first