Skip to content

LruDedup queue policy for Notification Consumer#1191

Merged
lguohan merged 1 commit into
sonic-net:masterfrom
nexthop-ai:lrudedup-queue-policy-notification
May 29, 2026
Merged

LruDedup queue policy for Notification Consumer#1191
lguohan merged 1 commit into
sonic-net:masterfrom
nexthop-ai:lrudedup-queue-policy-notification

Conversation

@senthil-nexthop
Copy link
Copy Markdown
Contributor

@senthil-nexthop senthil-nexthop commented May 19, 2026

What I did

Adds opt-in primitives to swss::NotificationConsumer for bounding queue growth and reducing cross-fanout cost on shared Redis pubsub channels.

How I did it

  • LRU-dedup queue policy (NotificationQueuePolicy::LruDedup) — collapses byte-identical payloads on enqueue (std::list + std::unordered_map<string, iter>). Drain order: "last-seen time per unique payload." Memory bounded by count(distinct in-flight payloads).
  • 5-arg NotificationConsumer constructor takes the policy. The 4-arg constructor's mangled symbol is preserved verbatim (no ABI break for python3-swsscommon, etc.).
  • setOpAllowList(ops) — admission filter that drops messages whose JSON-array leading op is not in the set, before they consume queue memory. Defends against cross-fanout on shared channels like "NOTIFICATIONS". Empty set (default) preserves legacy behavior.
  • setStatsLabel(label) — orch-qualified label so syslog can distinguish multiple consumers SUBSCRIBE'd to the same channel.
  • Atomic stats counters (received, dropped_allowlist, pushed, dedup_hits, high_watermark, current_depth) + getStats() + a self-throttled 5 s SWSS_LOG_NOTICE summary inline from processReply / push.

How to verify

Unit tests

Related work

HLD: sonic-net/SONiC#2334
sonic-sairedis PR: sonic-net/sonic-sairedis#1899
sonic-swss PR: sonic-net/sonic-swss#4586

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown

@securely1g securely1g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: LruDedup Queue Policy for NotificationConsumer (swss-common)

Summary

This is the foundational PR that adds opt-in queue strategy primitives to swss::NotificationConsumer. It introduces:

  • NotificationQueueBase abstract interface + FifoNotificationQueue (preserves legacy behavior) + LruDedupNotificationQueue (collapses byte-identical payloads)
  • NotificationQueuePolicy enum + new 5-arg constructor
  • setOpAllowList() — admission filter that drops messages by op before queueing
  • setStatsLabel() — orch-qualified label for syslog disambiguation
  • Atomic stats counters + getStats() + self-throttled 5s SWSS_LOG_NOTICE inline from processReply/push
  • Feature-test macro SWSS_NOTIFICATIONCONSUMER_HAS_LRU_DEDUP for downstream conditional compilation
  • Unit tests for both queue strategies

+620/-17, 5 files, clean single commit.


What's Good

  • ABI preservation — 4-arg constructor symbol is unchanged; old binaries (python3-swsscommon, etc.) continue to resolve against the same mangled name. The 5-arg ctor is a distinct symbol. Well thought out.
  • Strategy pattern — clean NotificationQueueBase abstraction allows adding new policies without touching consumer logic
  • peekOp() is efficient — bounded scan, no JSON parse, returns early on malformed input
  • Self-throttled logging — 5s interval + idle suppression means storm conditions produce steady but bounded log output
  • Atomic stats for cross-thread readspush()/pop() remain single-threaded, atomics are only for the telemetry reader. Correct use of memory_order_relaxed since these are independent monotonic counters with no ordering dependencies between them.
  • Feature-test macro — clean downstream integration pattern; companion PRs can build against either old or new libswsscommon
  • Good unit tests — covers FIFO ordering, dedup collapse, drain order, HWM monotonicity, memory bounding, and empty-pop safety

🔴 High Priority Issues

1. LRU reordering changes drain semantics

The LRU-dedup queue reorders entries: when a duplicate arrives, the old position is erased and re-inserted at the tail. This means pop() returns entries in "least-recently-seen" order, NOT arrival order.

For FDB events this is fine (end-state idempotent), but it's a subtle semantic difference that could surprise future consumers who opt in without understanding this. The class comment says "Drain order: last-seen time per unique payload" which is correct, but I'd suggest:

  • Adding a stronger warning in the NotificationQueuePolicy::LruDedup enum doc that ordering is not preserved
  • Consider whether a simpler "deduplicate but preserve original arrival order" policy (keep first seen position, drop subsequent duplicates) would be safer as the default dedup strategy

2. m_idx uses full message string as key — O(n) hash on large payloads

FDB event JSON payloads can be substantial (especially batch notifications with many FVTs). The unordered_map<string, iterator> hashes the entire message string on every push() and find(). Under storm conditions with many distinct payloads (different MACs), this is O(payload_length) per operation.

For the target use case (FDB storms with many identical payloads), the dedup hit rate is high and this is amortized well. But for consumers with mostly-distinct payloads, the hash overhead on large strings could be significant. Consider:

  • Documenting that this policy is most effective when dedup hit rate is high
  • Optionally pre-computing a hash (e.g., xxHash) at admission time for large payloads

3. push() is defined inline in the header

The LruDedupNotificationQueue::push() method is ~30 lines with multiple branches, atomic operations, and a function call (maybeLogStats()). Defining it inline in the header means:

  • Every translation unit that includes notificationconsumer.h gets a copy
  • Changes to push logic require recompiling all includers
  • The compiler may or may not inline it depending on optimization level

Move to the .cpp file alongside maybeLogStats() for consistency and to reduce header bloat.


🟡 Medium Priority Issues

4. subscribeWithRetry() infinite loop with no backoff

while (true)
{
    try { subscribe(); break; }
    catch(...) { delete m_subscribe; SWSS_LOG_ERROR(...); }
}

This busy-loops on Redis connection failure with no sleep/backoff. Pre-existing behavior, but now factored into a named function — good opportunity to add exponential backoff or at least a sleep(1).

5. maybeLogStats() called on every processReply and every push

Under storm conditions (the exact scenario this targets), processReply is called at event rate. Even though the 5s throttle gate short-circuits quickly, the steady_clock::now() call on every message is a syscall (clock_gettime) that adds up at 100k+ msg/s. Consider:

  • Checking only every N-th message (e.g., if (m_received % 1024 == 0))
  • Or moving stats logging to a timer-driven path instead of inline

6. No maximum queue depth enforcement

LruDedup bounds depth to count(distinct payloads) which is better than unbounded FIFO, but under a MAC storm with 100k distinct MACs (different source MACs flapping), the queue still grows to 100k entries. Consider an optional max_depth parameter that drops oldest entries when exceeded (true LRU eviction), or at minimum document this limitation.

7. Thread safety documentation

The header says "Single-threaded; no internal locking required" for the queue, but getStats() is explicitly designed for cross-thread reads. This is correct (relaxed atomics are sufficient for independent counters), but the documentation should be more precise: "push()/pop()/front()/empty()/size() must be called from a single thread. getStats() is safe to call from any thread."


🟢 Minor / Style

8. peekOp() returns std::string with a comment explaining why (C++14, no heterogeneous lookup). Consider adding a TODO for C++17 migration where string_view would avoid the allocation.

9. The Stats struct in LruDedupNotificationQueue has a comment about gcc/C++14 brace-init issues. This is a known quirk but might confuse future maintainers — a one-line reference to the gcc bug number would help.

10. m_label in LruDedupNotificationQueue is not atomic but can be written via setLabel() and read from maybeLogStats(). If setLabel is only called before the push/pop path starts, it's fine — but document this assumption.

11. The kStatsLogInterval is defined twice (once in the anonymous namespace in the .cpp, once implicitly referenced from the queue class). Consider making it a shared constant or at least documenting that both use the same 5s interval.


Questions

  • Has the LRU reordering (issue #1) been validated as safe for all currently-opted-in consumers? For FDB it's fine, but port_state_change notifications where UP→DOWN→UP could collapse DOWN+UP into just UP (if the payloads are byte-identical, which they wouldn't be since state differs) — actually this is safe since different states = different byte strings. Worth a comment though.
  • What's the measured overhead of steady_clock::now() per message on the target platform? On x86 with vDSO it's ~20ns, but on some ARM platforms it can be 200ns+.
  • Is there a plan to expose max_depth as a configurable parameter, or is "bounded by distinct payloads" considered sufficient?

Test Coverage

The unit tests are solid for the queue classes themselves. Missing:

  • No test for peekOp() edge cases (empty string, malformed JSON, missing closing quote)
  • No test for setOpAllowList filtering behavior (would need a mock Redis or the full NotificationConsumer wired up)
  • No test for the maybeLogStats throttling behavior

These are nice-to-haves; the core correctness tests are present.


Verdict: Well-designed foundational change with good ABI compatibility story and clean abstraction. The LRU reordering semantics (issue #1) should be more prominently documented, and the inline push() definition (issue #3) should move to the .cpp. The steady_clock::now() per-message overhead (issue #5) is worth measuring under load. Overall good to merge with minor fixes — this unblocks the companion sairedis and swss PRs.

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented May 25, 2026

@senthil-nexthop , can you check the comments, and also why pr is failing?

@senthil-nexthop senthil-nexthop force-pushed the lrudedup-queue-policy-notification branch from 2de32d0 to 6d2e378 Compare May 25, 2026 17:24
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@senthil-nexthop senthil-nexthop force-pushed the lrudedup-queue-policy-notification branch from 6d2e378 to 9def60e Compare May 25, 2026 19:15
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@senthil-nexthop
Copy link
Copy Markdown
Contributor Author

🔴 High Priority Issues

1. LRU reordering changes drain semantics

  • Adding a stronger warning in the NotificationQueuePolicy::LruDedup enum doc that ordering is not preserved

Added a comment in the LruDedup policy explaining the drain semantics.

  • Consider whether a simpler "deduplicate but preserve original arrival order" policy (keep first seen position, drop subsequent duplicates) would be safer as the default dedup strategy

Deduplication while preserving original order will lead to incorrect state for FDB events. For example, let's say the events received are: (A) LEARN(mac1, port1), (B) LEARN(mac1, port2) and (C) LEARN(mac1, port1).
LruDedup drain order: B, C. (A and C are collapsed and moved to end).
Final State: mac1 on port1, matches ASIC
FirstSeenDedup drain order: A, B (A and C are collapsed, order preserved).
Final State: mac1 on port2, does NOT match ASIC.

2. m_idx uses full message string as key — O(n) hash on large payloads

  • Documenting that this policy is most effective when dedup hit rate is high

The amortized cost is lower when dedup hit rate is high, which is observed to be 90%+ in the tests. Added a comment to reflect that.

3. push() is defined inline in the header

Move to the .cpp file alongside maybeLogStats() for consistency and to reduce header bloat.

Good catch, done.

🟡 Medium Priority Issues

4. subscribeWithRetry() infinite loop with no backoff

Outside the scope of this PR, we should take this up as a separate issue and add optimizations.

  • Checking only every N-th message (e.g., if (m_received % 1024 == 0))

Done, added a N-th message check to minimize cost of reading clock.

6. No maximum queue depth enforcement

LruDedup bounds depth to count(distinct payloads) which is better than unbounded FIFO, but under a MAC storm with 100k distinct MACs (different source MACs flapping), the queue still grows to 100k entries. Consider an optional max_depth parameter that drops oldest entries when exceeded (true LRU eviction), or at minimum document this limitation.

Added a comment to document this.

7. Thread safety documentation

Done, updated the comment.

🟢 Minor / Style

Updated the comments to address the style issue. Promoted peekOp() to make a testable function.

Questions

  • Has the LRU reordering (issue swss-common: Adding json.hpp library #1) been validated as safe for all currently-opted-in consumers? For FDB it's fine, but port_state_change notifications where UP→DOWN→UP could collapse DOWN+UP into just UP (if the payloads are byte-identical, which they wouldn't be since state differs) — actually this is safe since different states = different byte strings. Worth a comment though.

Both FDB and link state events are idempotent and safe.

  • What's the measured overhead of steady_clock::now() per message on the target platform? On x86 with vDSO it's ~20ns, but on some ARM platforms it can be 200ns+.

Added a N-th message gate to minimize cost.

  • Is there a plan to expose max_depth as a configurable parameter, or is "bounded by distinct payloads" considered sufficient?

This is strictly not necessary for FDB and ports event, but it's a good optimization to add for future event types.

Test Coverage

  • No test for peekOp() edge cases (empty string, malformed JSON, missing closing quote)
  • No test for setOpAllowList filtering behavior (would need a mock Redis or the full NotificationConsumer wired up)

Added these tests.

Verdict: Well-designed foundational change with good ABI compatibility story and clean abstraction. The LRU reordering semantics (issue #1) should be more prominently documented, and the inline push() definition (issue #3) should move to the .cpp. The steady_clock::now() per-message overhead (issue #5) is worth measuring under load. Overall good to merge with minor fixes — this unblocks the companion sairedis and swss PRs.

@senthil-nexthop
Copy link
Copy Markdown
Contributor Author

@senthil-nexthop , can you check the comments, and also why pr is failing?

@lguohan I've addressed the comments, added new tests and updated the PR. The DCO PR failure is fixed, but the pr is failing because of an upstream issue that is affecting all PRs.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented May 28, 2026

@senthil-nexthop , can you check a little bit about the coverage?

@senthil-nexthop senthil-nexthop force-pushed the lrudedup-queue-policy-notification branch from 87a91f9 to 753bc82 Compare May 28, 2026 16:37
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Senthil Krishnamurthy <senthil@nexthop.ai>
@senthil-nexthop senthil-nexthop force-pushed the lrudedup-queue-policy-notification branch from 753bc82 to c87533a Compare May 28, 2026 17:08
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@senthil-nexthop
Copy link
Copy Markdown
Contributor Author

@senthil-nexthop , can you check a little bit about the coverage?

@lguohan , added more test cases to increase coverage and all the checks have passed. The other PRs in sonic-swss and sonic-sairedis are failing because they have a dependency on symbols introduced by this PR. Once you approve and merge this, I can monitor the other 2 PRs and make sure the checks pass.

@lguohan lguohan merged commit 56a5f28 into sonic-net:master May 29, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants