Skip to content

Add enhanced SwssStats for comprehensive profiling#4434

Open
yutongzhang-microsoft wants to merge 14 commits into
sonic-net:masterfrom
yutongzhang-microsoft:swss-stats-implementation
Open

Add enhanced SwssStats for comprehensive profiling#4434
yutongzhang-microsoft wants to merge 14 commits into
sonic-net:masterfrom
yutongzhang-microsoft:swss-stats-implementation

Conversation

@yutongzhang-microsoft
Copy link
Copy Markdown

@yutongzhang-microsoft yutongzhang-microsoft commented Apr 3, 2026

What I did

Added SwssStats, a lightweight per-table statistics collector for orchagent. It tracks operation counts written to COUNTERS_DB every second via a background thread, with no impact on the main processing path.

Metrics tracked per table:

Field Description
SET Number of SET tasks received
DEL Number of DEL tasks received
COMPLETE Number of tasks successfully processed
ERROR Number of tasks that resulted in an error

Redis schema:

   SWSS_STATS:<table_name>
     SET       <count>
     DEL       <count>
     COMPLETE  <count>
     ERROR     <count>

Why I did it

The existing OrchStats (PR #2812) only tracks SET/DEL counts at the Orch level. There is no lightweight way to observe:

  • Which tables have pending/unprocessed tasks
  • Whether tasks are completing successfully or hitting errors
  • Per-table throughput in production deployments

SwssStats fills this gap with minimal overhead, making it easier to diagnose bottlenecks and processing anomalies in large-scale deployments without relying on the verbose swss.rec recording.

Design

  • Lock-free hot path: recordTask() and recordComplete() use std::atomic operations. The mutex is only held briefly by the background writer thread when snapshotting counters.
  • Version-based dirty tracking: Each TableStats has an atomic version counter incremented on every update. The writer thread skips Redis writes for tables with no changes since the last flush.
  • Deferred DB connection: DBConnector(COUNTERS_DB) is created inside writerThread() rather than the constructor, so singleton initialization during orchagent's early startup (bake phase) does not trigger Redis access before the DB is ready.
  • Stable references: Uses std::map instead of unordered_map so references returned by getOrCreateStats() remain valid after subsequent inserts (unordered_map can rehash and invalidate references).
  • Clean shutdown: Destructor uses condition_variable::notify_all() to wake the writer thread immediately instead of waiting up to 1 second.

How I verified it

Unit tests (tests/mock_tests/swssstats_ut.cpp):

  • Basic counter increments for SET, DEL, COMPLETE, ERROR
  • Unknown ops are silently ignored
  • Zero snapshot returned for unknown tables
  • Multiple tables are tracked independently
  • Thread safety: 8 concurrent threads x 1000 ops each, verified with TSan

Manual verification in docker-sonic-vs:

docker run --privileged -d --name vs docker-sonic-vs:<tag>
docker exec vs mkdir -p /zmq_swss
docker exec vs supervisorctl start orchagent

# Check stats after orchagent starts
yutongzhang@yutongzhang-dev-vm-2:~$ docker exec vs redis-cli -n 2 keys "SWSS_STATS*"
SWSS_STATS:SCHEDULER
SWSS_STATS:DSCP_TO_TC_MAP
SWSS_STATS:MAP_PFC_PRIORITY_TO_QUEUE
SWSS_STATS:PORT_QOS_MAP
SWSS_STATS:WRED_PROFILE
SWSS_STATS:TC_TO_PRIORITY_GROUP_MAP
SWSS_STATS:DEVICE_METADATA
SWSS_STATS:PORT_TABLE
SWSS_STATS:TC_TO_QUEUE_MAP

yutongzhang@yutongzhang-dev-vm-2:~$ docker exec vs redis-cli -n 2 hgetall "SWSS_STATS:DSCP_TO_TC_MAP"
SET
1
DEL
0
COMPLETE
0
ERROR
0

Physical testbed verification:

  • Built the latest SONiC image using sonic-buildimage PR #26924
  • Downloaded build artifacts and deployed on a physical testbed
  • Verified SwssStats counters are correctly populated on real hardware
admin@bjw2-can-7260-12:~$ redis-cli -n 2 keys "SWSS_STATS*"
 1) "SWSS_STATS:PORTCHANNEL_INTERFACE"
 2) "SWSS_STATS:PORTCHANNEL"
 3) "SWSS_STATS:NEIGH_TABLE"
 4) "SWSS_STATS:CABLE_LENGTH"
 5) "SWSS_STATS:BUFFER_POOL"
 6) "SWSS_STATS:MAP_PFC_PRIORITY_TO_QUEUE"
 7) "SWSS_STATS:ACL_TABLE"
 8) "SWSS_STATS:TC_TO_QUEUE_MAP"
 9) "SWSS_STATS:DSCP_TO_TC_MAP"
10) "SWSS_STATS:BUFFER_PROFILE"
11) "SWSS_STATS:LAG_TABLE"
12) "SWSS_STATS:ROUTE_TABLE"
13) "SWSS_STATS:BUFFER_QUEUE_TABLE"
14) "SWSS_STATS:PORT_QOS_MAP"
15) "SWSS_STATS:BUFFER_QUEUE"
16) "SWSS_STATS:PORTCHANNEL_MEMBER"
17) "SWSS_STATS:LAG_MEMBER_TABLE"
18) "SWSS_STATS:BUFFER_PG"
19) "SWSS_STATS:WRED_PROFILE"
20) "SWSS_STATS:TC_TO_PRIORITY_GROUP_MAP"
21) "SWSS_STATS:FLEX_COUNTER_TABLE"
22) "SWSS_STATS:BUFFER_PROFILE_TABLE"
23) "SWSS_STATS:TUNNEL_DECAP_TABLE"
24) "SWSS_STATS:INTF_TABLE"
25) "SWSS_STATS:TUNNEL_DECAP_TERM_TABLE"
26) "SWSS_STATS:SCHEDULER"
27) "SWSS_STATS:LOOPBACK_INTERFACE"
28) "SWSS_STATS:PORT_TABLE"
29) "SWSS_STATS:BUFFER_PG_TABLE"
30) "SWSS_STATS:BGP_DEVICE_GLOBAL"
31) "SWSS_STATS:CRM"
32) "SWSS_STATS:FEATURE"
33) "SWSS_STATS:BUFFER_POOL_TABLE"
34) "SWSS_STATS:PORT"
35) "SWSS_STATS:COPP_TABLE"
36) "SWSS_STATS:SWITCH_TABLE"
37) "SWSS_STATS:QUEUE"
38) "SWSS_STATS:PFC_WD"
39) "SWSS_STATS:DEVICE_METADATA"

admin@bjw2-can-7260-12:~$ redis-cli -n 2 hgetall "SWSS_STATS:ROUTE_TABLE"
1) "SET"
2) "15390"
3) "DEL"
4) "14"
5) "COMPLETE"
6) "15404"
7) "ERROR"
8) "0"

Performance

  • CPU: counter updates use memory_order_relaxed atomics - negligible overhead on the hot path
  • Memory: ~80 bytes per tracked table (5 x atomic<uint64_t>)
  • Redis writes: at most once per second per changed table, skipped entirely if no updates

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

yutongzhang-microsoft and others added 3 commits April 15, 2026 15:34
What I did:
- Added SwssStats class with enhanced statistics collection
- Supports operation counters (SET/DEL/COMPLETED/ERROR)
- Tracks latency metrics (min/max/avg/total in microseconds)
- Monitors queue depth (current/max)
- Uses lock-free atomic operations for zero performance impact
- Background thread writes to Redis COUNTERS_DB every 1 second

Why I did it:
- Original OrchStats (PR sonic-net#2812) only tracks SET/DEL counts
- Need comprehensive performance monitoring for production debugging
- Lightweight alternative to swss.rec with minimal CPU/disk overhead
- Essential for analyzing bottlenecks in large-scale deployments

How I verified it:
- Follows OrchStats design pattern from PR sonic-net#2812
- All statistics accessible via Redis COUNTERS_DB
- Query tools provided (query_stats.sh, monitor_stats.py)

Details:
- Table name: SWSS_STATS_TABLE (vs ORCH_STATS_TABLE)
- 10 metrics per table vs 2 in OrchStats
- Performance: <0.1% CPU, ~1KB memory per table

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Simplified the statistics implementation to be self-contained:

Changes:
- Removed complex latency tracking (can be added later if needed)
- Removed queue depth monitoring
- Simplified API: recordTask(table, op), recordComplete(), recordError()
- Reduced code size by ~90 lines
- No dependency on any existing stats implementation

Core features retained:
- Track SET/DEL operations per table
- Monitor task completion count
- Track errors
- Atomic operations for thread safety
- Background thread updates Redis every 1 second
- Writes to COUNTERS_DB SWSS_STATS table

This is a clean, minimal implementation that can work independently.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
- Fix data race: change m_running from bool to std::atomic<bool>
- Fix reference invalidation: replace unordered_map with std::map so
  references returned by getOrCreateStats() remain stable after
  subsequent inserts (unordered_map can rehash, invalidating refs)
- Fix memory ordering: use memory_order_release on version.fetch_add()
  and memory_order_acquire on version.load() in writer thread, with
  memory_order_relaxed on counter updates to match the documented
  happens-before relationship
- Fix shutdown latency: replace sleep_for() with condition_variable
  wait_for() so destructor wakes the writer thread immediately
- Fix gSwssStatsRecord: change to std::atomic<bool> to prevent data
  race if toggled at runtime; add extern declaration in swssstats.h
- Remove SWSS_LOG_ENTER() from hot-path record* methods
- Wire up recordComplete(): Consumer::drain() now counts tasks removed
  from m_toSync and calls recordComplete() so the COMPLETE counter is
  actually populated
- Add count parameter to recordComplete/recordError for batch updates
- Fix Makefile.am: add missing space before backslash continuation
  on notifications.cpp line

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

yutongzhang-microsoft and others added 2 commits April 15, 2026 15:51
- Add getCounters() method to SwssStats for counter inspection in tests
  and diagnostics (returns CounterSnapshot struct with SET/DEL/COMPLETE/ERROR)
- Add swssstats_ut.cpp with gtest coverage:
  * Basic counter increment tests (SET, DEL, COMPLETE, ERROR)
  * Unknown op is silently ignored
  * Default count=1 for recordComplete/recordError
  * Zero snapshot for unknown tables
  * Multiple tables are independent
  * Thread-safety: 8 concurrent threads with 1000 ops each, no data race
  * Mixed concurrent ops (recordTask + recordComplete + recordError)
  * Destructor/shutdown fast-path sanity check
- Wire swssstats_ut.cpp and swssstats.cpp into tests/mock_tests/Makefile.am

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
'swss::FieldValueTuple' is declared as a typedef in swsscommon's table.h,
so it cannot be forward-declared with 'class'. The forward declaration was
introduced when dumpStats() was a class member.

Fix by:
- Remove 'class FieldValueTuple' forward declaration from swssstats.h
- Move dumpStats() out of the class and make it a file-local static
  function in swssstats.cpp (it was already private with no external callers)
- Move TableStats struct to public section so the file-local static
  function can access it without a friend declaration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

orch.cpp now calls SwssStats::getInstance() and SwssStats::recordTask/
recordComplete, so every build target that links orch.cpp must also
link swssstats.cpp.

Affected Makefiles:
- orchagent/p4orch/tests/Makefile.am (p4orch_tests)
- cfgmgr/Makefile.am (COMMON_ORCH_SOURCE shared by vlanmgrd, teammgrd, etc.)

tests/mock_tests/Makefile.am already has swssstats.cpp from the
previous commit.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Add swssstats.cpp to tests_intfmgrd_SOURCES and tests_teammgrd_SOURCES
in tests/mock_tests/Makefile.am. Both targets link orch.cpp which now
references SwssStats symbols.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

tests_nbrmgrd links orchagent/orch.cpp which references SwssStats, but
swssstats.cpp was not listed in tests_nbrmgrd_SOURCES, causing undefined
reference errors at link time. Add swssstats.cpp to fix the build.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

Sync with master introduced a merge conflict resolution error in
Consumer::drain(): doTask() was called twice and the stats recording
if-block was missing its closing brace, causing the try/catch to be
nested inside it. This broke the build with 'qualified-id in
declaration before token' errors.

Fix: move try/catch to wrap the single doTask() call, and place the
SwssStats recordComplete() block after the try/catch.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

The SwssStats singleton is first created when addToSync() is called
during early orchagent initialization (bake phase). At that point,
calling DBConnector("COUNTERS_DB", 0) in the constructor triggers a
synchronous Redis GET via swsscommon that returns 0 values, causing a
std::runtime_error (waitForGetResponse) and orchagent crash (SIGABRT).

Fix: move m_db and m_table creation from the constructor into the
start of writerThread(), where orchagent is fully initialized and
COUNTERS_DB is accessible. Added try/catch so a connection failure
disables Redis writes without crashing the process.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@yutongzhang-microsoft yutongzhang-microsoft marked this pull request as ready for review April 21, 2026 06:56
Copilot AI review requested due to automatic review settings April 21, 2026 06:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new SwssStats collector intended to provide lightweight, always-on orchestration statistics exported into COUNTERS_DB, and wires it into orchagent’s task ingestion (addToSync) and processing (drain) flow. It also updates build/test wiring so the new stats implementation is compiled into relevant binaries and adds a mock-test unit test file.

Changes:

  • Added SwssStats singleton with a background writer thread that periodically writes per-table counters to COUNTERS_DB.
  • Integrated stats recording into ConsumerBase::addToSync() (SET/DEL) and Consumer::drain() (COMPLETE).
  • Added/updated mock tests and Makefile sources to compile/link the new implementation across existing unit-test binaries and components.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
tests/mock_tests/swssstats_ut.cpp Adds unit tests for counter correctness and concurrency behavior.
tests/mock_tests/Makefile.am Builds swssstats_ut.cpp and links swssstats.cpp into test binaries.
orchagent/swssstats.h Declares the SwssStats API and internal counter storage.
orchagent/swssstats.cpp Implements the singleton, counter updates, and Redis writer thread.
orchagent/orch.cpp Hooks stats into task enqueue and drain paths; defines gSwssStatsRecord.
orchagent/Makefile.am Adds swssstats.cpp to orchagent build.
cfgmgr/Makefile.am Links swssstats.cpp alongside orch.cpp in common sources.
orchagent/p4orch/tests/Makefile.am Links swssstats.cpp into p4orch unit tests.

Comment thread orchagent/orch.cpp
using namespace swss;

int gBatchSize = 0;
std::atomic<bool> gSwssStatsRecord(true); // Enable SwssStats by default
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gSwssStatsRecord is enabled by default, but there is no mechanism in this PR to toggle it at runtime (no config/env/CLI hook), and it introduces extra per-task work and a background writer thread. Consider defaulting this to disabled or wiring it to a configuration option so operators can turn it on only when needed.

Copilot uses AI. Check for mistakes.
Comment thread orchagent/swssstats.h
Comment on lines +80 to +101
// m_running uses atomic to avoid data race between main and writer threads
std::atomic<bool> m_running;
uint32_t m_interval_sec;
std::unique_ptr<std::thread> m_thread;
std::mutex m_mutex;
// m_cv allows the destructor to wake the writer thread immediately
std::condition_variable m_cv;

std::shared_ptr<swss::DBConnector> m_db;
std::unique_ptr<swss::Table> m_table;

// std::map is used instead of unordered_map: map iterators and references
// to existing elements remain valid after new insertions, which is required
// because recordTask() holds a reference after releasing m_mutex.
std::map<std::string, TableStats> m_stats;

SwssStats(uint32_t interval = 1);

// Returns a stable reference to the TableStats for the given table,
// creating it if it does not exist. Safe to use after m_mutex is released
// because std::map never invalidates existing element references.
TableStats& getOrCreateStats(const std::string &table_name);
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description claims “lock-free atomic operations for zero performance impact”, but the implementation takes m_mutex in every record*() call via getOrCreateStats() to access m_stats. Either adjust the PR description/expectations, or consider an approach that avoids a global mutex on the hot path (e.g., per-table cached pointer/reference, sharded maps, or thread-local aggregation).

Copilot uses AI. Check for mistakes.
Comment thread tests/mock_tests/swssstats_ut.cpp Outdated
Comment on lines +167 to +175
thread t1([s, &tbl]()
{
for (int i = 0; i < ops; i++) s->recordTask(tbl, "SET");
});
thread t2([s, &tbl]()
{
for (int i = 0; i < ops; i++) s->recordComplete(tbl);
});
thread t3([s, &tbl]()
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lambda for t1 uses the local variable ops but does not capture it (capture list is [s, &tbl]). This will not compile; capture ops by value (or use [=]).

Suggested change
thread t1([s, &tbl]()
{
for (int i = 0; i < ops; i++) s->recordTask(tbl, "SET");
});
thread t2([s, &tbl]()
{
for (int i = 0; i < ops; i++) s->recordComplete(tbl);
});
thread t3([s, &tbl]()
thread t1([s, &tbl, ops]()
{
for (int i = 0; i < ops; i++) s->recordTask(tbl, "SET");
});
thread t2([s, &tbl, ops]()
{
for (int i = 0; i < ops; i++) s->recordComplete(tbl);
});
thread t3([s, &tbl, ops]()

Copilot uses AI. Check for mistakes.
Comment thread tests/mock_tests/swssstats_ut.cpp Outdated
Comment on lines +171 to +174
thread t2([s, &tbl]()
{
for (int i = 0; i < ops; i++) s->recordComplete(tbl);
});
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lambda for t2 uses the local variable ops but does not capture it (capture list is [s, &tbl]). This will not compile; capture ops by value (or use [=]).

Copilot uses AI. Check for mistakes.
Comment thread tests/mock_tests/swssstats_ut.cpp Outdated
Comment on lines +167 to +175
thread t1([s, &tbl]()
{
for (int i = 0; i < ops; i++) s->recordTask(tbl, "SET");
});
thread t2([s, &tbl]()
{
for (int i = 0; i < ops; i++) s->recordComplete(tbl);
});
thread t3([s, &tbl]()
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lambda for t3 uses the local variable ops but does not capture it (capture list is [s, &tbl]). This will not compile; capture ops by value (or use [=]).

Suggested change
thread t1([s, &tbl]()
{
for (int i = 0; i < ops; i++) s->recordTask(tbl, "SET");
});
thread t2([s, &tbl]()
{
for (int i = 0; i < ops; i++) s->recordComplete(tbl);
});
thread t3([s, &tbl]()
thread t1([s, &tbl, ops]()
{
for (int i = 0; i < ops; i++) s->recordTask(tbl, "SET");
});
thread t2([s, &tbl, ops]()
{
for (int i = 0; i < ops; i++) s->recordComplete(tbl);
});
thread t3([s, &tbl, ops]()

Copilot uses AI. Check for mistakes.
Comment thread orchagent/orch.cpp
Comment on lines 571 to +601
@@ -581,6 +592,13 @@ void Consumer::drain()
SWSS_LOG_ERROR("Exception caught: type=unknown, table=%s",
getName().c_str());
}
if (gSwssStatsRecord && size_before > 0)
{
size_t size_after = m_toSync.size();
uint64_t completed = (size_before > size_after) ? (size_before - size_after) : 0;
if (completed > 0)
SwssStats::getInstance()->recordComplete(getTableName(), completed);
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recordComplete() is called even when doTask() throws and the drain path enters one of the catch blocks. That can inflate “COMPLETE” counts during failures. Consider recording completion only on the success path (e.g., after doTask() returns normally), and use recordError() in the exception paths if you want ERROR to reflect task processing failures.

Copilot uses AI. Check for mistakes.
Comment thread tests/mock_tests/swssstats_ut.cpp Outdated
Comment on lines +144 to +148
threads.emplace_back([s, &tbl]()
{
for (int j = 0; j < ops_per_thread; j++)
{
s->recordTask(tbl, (j % 2 == 0) ? "SET" : "DEL");
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this lambda, ops_per_thread is used inside the body but is not captured (capture list is [s, &tbl]). This won’t compile. Capture ops_per_thread (and any other referenced locals) by value, or switch to [=] / [&, ops_per_thread] as appropriate.

Copilot uses AI. Check for mistakes.
Comment thread tests/mock_tests/swssstats_ut.cpp Outdated
Comment on lines +19 to +25
// Return a fresh SwssStats instance with a very long flush interval so the
// background writer never fires during tests, keeping tests fast and deterministic.
static SwssStats* stats()
{
// The singleton is reused across tests in the same process; that is fine
// because each test reads back only what it wrote, using unique table names.
return SwssStats::getInstance();
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper/comment says it returns an instance with a long flush interval so the background writer never fires, but it always returns the singleton with the default interval (1s). Either update the comment to match reality, or add a test hook/config to disable the writer thread / change the interval for unit tests to keep them deterministic.

Copilot uses AI. Check for mistakes.
Comment thread orchagent/swssstats.cpp Outdated
// not yet be accessible (causes waitForGetResponse crash).
m_thread = make_unique<thread>(&SwssStats::writerThread, this);

SWSS_LOG_NOTICE("SwssStats initialized (interval: %d sec)", m_interval_sec);
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SWSS_LOG_NOTICE("... %d ...", m_interval_sec) uses %d for m_interval_sec (a uint32_t). This can trigger format warnings or incorrect output on some platforms; use an unsigned format (%u) or a fixed-width macro like PRIu32.

Suggested change
SWSS_LOG_NOTICE("SwssStats initialized (interval: %d sec)", m_interval_sec);
SWSS_LOG_NOTICE("SwssStats initialized (interval: %u sec)", m_interval_sec);

Copilot uses AI. Check for mistakes.
Comment thread orchagent/swssstats.cpp
#include "dbconnector.h"
#include "table.h"
#include "logger.h"
#include <chrono>
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file uses unordered_map, piecewise_construct, and forward_as_tuple, but the required standard headers (<unordered_map>, <utility>, <tuple>) aren’t included. This may fail to compile depending on transitive includes; add the missing includes explicitly.

Suggested change
#include <chrono>
#include <chrono>
#include <tuple>
#include <unordered_map>
#include <utility>

Copilot uses AI. Check for mistakes.
Fixes identified in code review:

- swssstats.cpp: add missing includes (<tuple>, <unordered_map>, <utility>)
- swssstats.cpp: fix printf format specifier %d -> %u for uint32_t (line 29)
- swssstats.cpp: only bump version counter when SET/DEL actually modifies a
  counter; unknown ops no longer cause spurious Redis writes
- swssstats.h: add missing #include <cstdint> for uint32_t / uint64_t
- swssstats_ut.cpp: fix lambda captures missing ops_per_thread / ops; lambdas
  now correctly capture all variables they reference
- swssstats_ut.cpp: update stats() helper comment to reflect that it returns
  the default-interval singleton, not a special long-interval instance
- orch.cpp: Consumer::drain() now calls recordError() in every catch block and
  only calls recordComplete() on the success path, preventing inflated COMPLETE
  counts when doTask() throws

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Comment thread orchagent/swssstats.cpp Outdated

SwssStats::CounterSnapshot SwssStats::getCounters(const string &table_name)
{
lock_guard<mutex> lock(m_mutex);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename the mutex. Make meaningful names

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to m_statsMutex

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants