Skip to content

LruDedup queue policy notification#1899

Merged
lguohan merged 1 commit into
sonic-net:masterfrom
nexthop-ai:lrudedup-queue-policy-notification
May 29, 2026
Merged

LruDedup queue policy notification#1899
lguohan merged 1 commit into
sonic-net:masterfrom
nexthop-ai:lrudedup-queue-policy-notification

Conversation

@senthil-nexthop
Copy link
Copy Markdown
Contributor

@senthil-nexthop senthil-nexthop commented May 19, 2026

Description of PR

Opts RedisChannel::m_notificationConsumer into NotificationQueuePolicy::LruDedup and sets an orch-qualified stats label.

RedisChannel runs a dedicated notification thread inside every libsairedis-linked process (orchagent, etc.) that SUBSCRIBE's to the same "NOTIFICATIONS" Redis channel as orchagent's per-orch NotificationConsumers. Under an FDB-event storm this thread's FIFO queue grew unbounded — we measured ~1.5 GB at peak even after the orchagent-side opt-in.

LRU-dedup collapses byte-identical payloads at admission. saimeta::Meta state transitions are idempotent under byte-identical payloads, so m_saiObjectCollection end state is unchanged.

Type of change

  • Bug fix
  • New feature
  • Refactor / cleanup
  • Documentation update
  • Test improvement

Approach

How did you do it?

Changed the Notification consumer class to use the LruDedup queue policy

How did you verify/test it?

Unit Tests

Related work

HLD: sonic-net/SONiC#2334
sonic-swss PR: sonic-net/sonic-swss#4586
sonic-swss-common PR: sonic-net/sonic-swss-common#1191

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@senthil-nexthop senthil-nexthop changed the title Lrudedup queue policy notification LruDedup queue policy notification May 19, 2026
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown

@securely1g securely1g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: LruDedup Queue Policy for RedisChannel NotificationConsumer

Summary

Opts RedisChannel::m_notificationConsumer into NotificationQueuePolicy::LruDedup to bound queue depth under FDB event storms. Single functional change in lib/RedisChannel.cpp (+40/-1), plus aspell dictionary updates.


What's Good

  • Targeted, minimal change — one file, one consumer, clear problem statement (1.5 GB queue growth under storm)
  • Well-guarded#ifdef SWSS_NOTIFICATIONCONSUMER_HAS_LRU_DEDUP ensures clean builds against older swss-common
  • Good inline documentation — explains why byte-identical dedup is safe (saimeta::Meta transitions are idempotent under identical payloads)
  • Preserves Select-loop priority — explicitly passes pri=100 to match the previous 2-arg ctor default
  • Stats label — orch-qualified label distinguishes this consumer from orchagent's per-orch consumers in syslog/monitoring

🟡 Medium Priority Issues

1. No setOpAllowList set on this consumer

The orchagent-side PR (sonic-swss #4586) sets setOpAllowList(...) on every consumer to filter irrelevant ops early. This RedisChannel consumer doesn't set one — it accepts all ops from the NOTIFICATIONS channel. This is likely intentional (libsairedis needs to process all notification types: fdb_event, port_state_change, bfd, etc.), but worth confirming. If it processes everything, the dropped_allowlist counter will always be 0. A brief comment saying "no allowlist — RedisChannel processes all notification types" would clarify intent.

2. No registration with gNotifConsumerStatsOrch

The companion swss PR registers each consumer with the stats orch for COUNTERS_DB publication. This consumer lives in libsairedis (a library), not in orchagent, so it can't directly reference gNotifConsumerStatsOrch. The setStatsLabel is set but nobody publishes its stats to COUNTERS_DB. Is the intent to only emit syslog-level stats, or is there a plan for libsairedis to self-publish? Worth noting in the PR description.

3. Two commits — should squash

The second commit ("Remove duplicate entries and clean up formatting") is purely aspell cleanup. SONiC convention is typically a single logical commit per PR. Consider squashing before merge.


🟢 Minor

4. The aspell additions are fine but PCID and ZeroMQChannel appear unrelated to this PR — possibly leftover from a rebase. Not blocking but worth cleaning up.

5. The #else legacy path can be annotated with a TODO and target release for removal, matching what's done in the swss PR.


Questions

  • Under storm conditions with LruDedup active, what's the observed peak queue depth now? (The PR description mentions 1.5 GB at peak with FIFO — what does it drop to with dedup?)
  • Is there concern about the notification thread's pop() latency increasing due to the LRU index maintenance, or is the dedup lookup O(1) amortized?
  • The REDIS_TABLE_NOTIFICATIONS_PER_DB(dbAsic) macro — does this resolve to different channel names per DB instance, or is it always "NOTIFICATIONS"? If always the same, this consumer and orchagent's consumers are all on the same channel (redundant fan-out).

Dependency

Requires sonic-swss-common PR #1191 merged first (provides SWSS_NOTIFICATIONCONSUMER_HAS_LRU_DEDUP, NotificationQueuePolicy::LruDedup, DEFAULT_NC_POP_BATCH_SIZE, setStatsLabel, getLruDedupQueue).


Verdict: Simple, safe, well-guarded change. The core logic is correct — LruDedup on byte-identical payloads is safe for saimeta's idempotent processing model. Squash commits and optionally add a comment about the missing allowlist. Good to merge once the swss-common dependency lands.

@senthil-nexthop senthil-nexthop force-pushed the lrudedup-queue-policy-notification branch from 12cb318 to 40267a2 Compare May 25, 2026 20:12
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@senthil-nexthop
Copy link
Copy Markdown
Contributor Author

🟡 Medium Priority Issues

1. No setOpAllowList set on this consumer

The RedisChannel consumer should see ALL notification, so an op allow list is not needed.

2. No registration with gNotifConsumerStatsOrch

The companion swss PR registers each consumer with the stats orch for COUNTERS_DB publication. This consumer lives in libsairedis (a library), not in orchagent, so it can't directly reference gNotifConsumerStatsOrch. The setStatsLabel is set but nobody publishes its stats to COUNTERS_DB. Is the intent to only emit syslog-level stats, or is there a plan for libsairedis to self-publish? Worth noting in the PR description.

The syslog notification would still provide the dedup stats under storm.

3. Two commits — should squash

Done.

Questions

  • Under storm conditions with LruDedup active, what's the observed peak queue depth now? (The PR description mentions 1.5 GB at peak with FIFO — what does it drop to with dedup?)

Under FDB storm, the peek queue depth was 4000 (for 1000 MACs moving between 2 ports, at 50K/second).

  • Is there concern about the notification thread's pop() latency increasing due to the LRU index maintenance, or is the dedup lookup O(1) amortized?

The cost is amortized.

  • The REDIS_TABLE_NOTIFICATIONS_PER_DB(dbAsic) macro — does this resolve to different channel names per DB instance, or is it always "NOTIFICATIONS"? If always the same, this consumer and orchagent's consumers are all on the same channel (redundant fan-out).

It resolves to NOTIFICATIONS channel, same as all the other orchs.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@senthil-nexthop senthil-nexthop force-pushed the lrudedup-queue-policy-notification branch from a5b6078 to 6d27660 Compare May 28, 2026 18:52
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@lolyu
Copy link
Copy Markdown
Contributor

lolyu commented May 29, 2026

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@senthil-nexthop
Copy link
Copy Markdown
Contributor Author

Azure Pipelines successfully started running 1 pipeline(s).

@lolyu Just FYI, this PR has a dependency on sonic-net/sonic-swss-common#1191, which adds a symbol needed by this PR. So the build will fail until swss-common merges.

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented May 29, 2026

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Senthil Krishnamurthy <senthil@nexthop.ai>
@senthil-nexthop senthil-nexthop force-pushed the lrudedup-queue-policy-notification branch from 6d27660 to 55b2192 Compare May 29, 2026 16:23
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@lguohan lguohan merged commit 1d7d371 into sonic-net:master May 29, 2026
19 checks passed
@cpackham-atlnz
Copy link
Copy Markdown

I just pulled down the latest sonic-buildimage which picked up this change and now I'm seeing the following build error

RedisChannel.cpp:50:19: error: 'swss::NotificationQueuePolicy' has not been declared                                                                                                                                                        
   50 |             swss::NotificationQueuePolicy::LruDedup);                                                                                                                                                                               
      |                   ^~~~~~~~~~~~~~~~~~~~~~~                                                                                                                                                                                           
RedisChannel.cpp:54:29: error: 'using std::__shared_ptr_access<swss::NotificationConsumer, __gnu_cxx::_S_atomic, false, false>::element_type = class swss::NotificationConsumer' {aka 'class swss::NotificationConsumer'} has no member name
d 'setStatsLabel'                                                                                                                                                                                                                           
   54 |     m_notificationConsumer->setStatsLabel("libsairedis:RedisChannel:" + std::string(REDIS_TABLE_NOTIFICATIONS_PER_DB(dbAsic)));                                                                                                     
      |                             ^~~~~~~~~~~~~                                                                                                                                                                                           

does this look familiar?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants