Skip to content

[fdborch] Batch-drain FDB notifications in FdbOrch::doTask#4604

Closed
manish1-arista wants to merge 1 commit into
sonic-net:masterfrom
manish1-arista:fdborch-batch-drain-notifications
Closed

[fdborch] Batch-drain FDB notifications in FdbOrch::doTask#4604
manish1-arista wants to merge 1 commit into
sonic-net:masterfrom
manish1-arista:fdborch-batch-drain-notifications

Conversation

@manish1-arista
Copy link
Copy Markdown
Contributor

What I did
In FdbOrch::doTask(NotificationConsumer&), replace the single consumer.pop() with consumer.pops() to drain a batch of pending entries on the notification channel per main-loop iteration instead of just one. The per-entry processing logic is factored unchanged into a new handleNotification() helper that doTask() dispatches each popped entry to.

No behavioral change for any individual notification, only the rate at which the queue is drained.

Why I did it
Under FDB event storms (large MAC moves etc), the producer side pushes notifications into the queue faster than the previous one-at-a-time drain could consume them. The backlog grew unboundedly, which on our deployments showed up as:

  • orchagent RSS growth tracking the queue length
  • in extreme cases, OOM / kernel panics on the device

NotificationConsumer::pops() already exists exactly for this pattern and is what some other orch agents (e.g. portsorch) use.

How I verified it
Built sonic-swss against the change cleanly.
The Queue length remains in check under mac storm.

Details if related

Replace the single consumer.pop() with consumer.pops() so each main-loop
iteration drains a batch of pending FDB notifications instead of
just one, preventing the queue from growing
unboundedly under FDB event storms.

Signed-off-by: manish1 <manish1@arista.com>
@manish1-arista manish1-arista requested a review from prsunny as a code owner May 25, 2026 09:58
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@manish1-arista
Copy link
Copy Markdown
Contributor Author

Our Batch Drain FDB Notification consumer queue approach targeted fdborch only, drain the batch of pending entries on the notification channel per main-loop iteration instead of just one.
We are are able to see that the FDB Notification consumer queue size was not increasing under Mac move storm.

The Other PRs (sonic-net/sonic-swss-common#1191, #4586 and sonic-net/sonic-sairedis#1899) fixes this memory growth for other orch consumer queue also by addressing the problem at the infrastructure layer rather than a single orch, by adding allowlist for each orchagent and Dedup queues for multiple orchangents.

Closing this PR as other PRs had merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants