Skip to content

Add reusable ComponentStats library for per-container statistics.#1180

Open
yutongzhang-microsoft wants to merge 8 commits into
sonic-net:masterfrom
yutongzhang-microsoft:feature/component-stats
Open

Add reusable ComponentStats library for per-container statistics.#1180
yutongzhang-microsoft wants to merge 8 commits into
sonic-net:masterfrom
yutongzhang-microsoft:feature/component-stats

Conversation

@yutongzhang-microsoft
Copy link
Copy Markdown

@yutongzhang-microsoft yutongzhang-microsoft commented Apr 24, 2026

What I did

Added a new reusable library swss::ComponentStats under libswsscommon, together with its unit tests. The library lets any SONiC container publish service-level counters into COUNTERS_DB under a uniform <COMPONENT>_STATS:<entity> key layout.

This is a purely additive change — no existing API is modified or removed, and no current consumer of libswsscommon calls into the new code, so behaviour of every existing SONiC binary is unchanged.

Why I did it

SONiC already has dataplane counters (Flex-Counter → syncd → COUNTERS_DB). What we are missing is service-level counters — software-side events such as orchagent task throughput, gNMI request rate, BMP message error counts. Without them we cannot answer questions like "is orchagent draining tasks?", "is gNMI seeing subscribe failures?", "is one container dropping more events than its peers?".

If each container rolled its own plumbing for this, every container would carry its own concurrency review, the bug fixes would drift, and the on-the-wire schemas would diverge. A single shared library is the only way to keep these properties consistent across SONiC.

Full motivation and design are in the HLD: sonic-net/SONiC#2312.

How I did it

  • Implemented the library as a producer with a lock-free hot path, a 1 s background writer thread, version-based dirty tracking (idle entities produce no Redis traffic), lazy DB connection with retry, and a per-component singleton.
  • Kept the public API narrow and stable so future containers only need a ~30 LoC facade to adopt it.
  • Added a unit-test suite (tests/componentstats_ut.cpp, 9 cases) covering basic round-trip, gauge semantics, runtime enable/disable, bulk read, singleton behaviour, and 8-thread × 10 000-increment concurrency.
  • Verified the library can be merged independently: the first consumer (sonic-swss#4516) depends on this PR but this PR does not depend on it.

Related

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

yutongzhang-microsoft added a commit to yutongzhang-microsoft/sonic-buildimage that referenced this pull request Apr 24, 2026
Temporarily repoint src/sonic-swss and src/sonic-swss-common to the forks that carry the ComponentStats refactor so CI can build an end-to-end image and exercise the new SwssStats thin wrapper.

- sonic-swss-common -> yutongzhang-microsoft/sonic-swss-common@bc9b47c (feature/component-stats, PR sonic-net/sonic-swss-common#1180)

- sonic-swss -> yutongzhang-microsoft/sonic-swss@f77ef29 (swss-stats-use-componentstats, PR sonic-net/sonic-swss#4516)

This commit will be revised once both PRs merge into their respective upstream repos.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new reusable ComponentStats facility in sonic-swss-common to provide container-agnostic in-memory counters with periodic Redis flushing, intended to be shared by multiple SONiC components (orchagent/gnmi/bmp/telemetry, etc.).

Changes:

  • Added swss::ComponentStats implementation with singleton-by-component registry, atomic counters, dirty tracking, and a background writer thread that flushes to Redis.
  • Added Google Test unit/integration coverage for core behaviors (increments, singleton semantics, concurrency, Redis flushing, dirty-tracking, shutdown, and connect-retry).
  • Wired new sources into the autotools build/test targets.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
common/componentstats.h Declares the new ComponentStats API and documents threading/behavior contracts.
common/componentstats.cpp Implements singleton registry, atomic counter storage, and Redis writer thread with connect-retry and periodic flush.
common/Makefile.am Adds componentstats.cpp to the common library build.
tests/componentstats_ut.cpp Adds unit + Redis integration tests for ComponentStats.
tests/Makefile.am Adds the new test file to the test binary build.

Comment thread common/componentstats.h Outdated
*
* Each instance writes counters to a Redis hash of the form:
* <COMPONENT>_STATS:<entity>
* where each field is a user-defined metric name (e.g. SET, DEL, GET, ERROR…).
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment uses a Unicode ellipsis character ("…") inside the public API documentation. This repository otherwise appears to use ASCII-only source, and non-ASCII punctuation can cause tooling/encoding issues. Consider replacing it with "...".

Suggested change
* where each field is a user-defined metric name (e.g. SET, DEL, GET, ERROR).
* where each field is a user-defined metric name (e.g. SET, DEL, GET, ERROR...).

Copilot uses AI. Check for mistakes.
Comment thread common/componentstats.h Outdated
Comment on lines +94 to +96
/// setValue() become no-ops; existing in-memory counters are preserved
/// and the writer thread keeps running (it just has nothing new to
/// flush, so previously-flushed values stay in Redis as-is).
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setEnabled() documentation says the writer thread will have "nothing new to flush" and Redis values will stay "as-is" while disabled, but writerThread() does not check m_enabled and can still flush updates that were made just before disabling (or flush after re-enabling without a clear policy). Either update the doc to match the behavior, or gate flushing on m_enabled (and decide whether dirty state should be preserved for a later flush).

Suggested change
/// setValue() become no-ops; existing in-memory counters are preserved
/// and the writer thread keeps running (it just has nothing new to
/// flush, so previously-flushed values stay in Redis as-is).
/// setValue() become no-ops and existing in-memory counters are preserved.
/// The writer thread continues to run; disabling does not guarantee that
/// counters already marked dirty before setEnabled(false) will not still
/// be flushed to Redis. Redis therefore remains unchanged only if there
/// was no pending dirty state at the time recording was disabled.

Copilot uses AI. Check for mistakes.
Comment thread common/componentstats.cpp Outdated
Comment on lines +220 to +226
{
SWSS_LOG_WARN("ComponentStats[%s]: DB connect failed: %s. Retrying…",
m_component.c_str(), e.what());
std::unique_lock<std::mutex> lock(m_mutex);
m_cv.wait_for(lock, std::chrono::seconds(m_intervalSec),
[this]{ return !m_running.load(std::memory_order_relaxed); });
}
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message contains a Unicode ellipsis character ("…"). For consistency and to avoid build/log encoding issues in environments that assume ASCII source files, prefer "...".

Copilot uses AI. Check for mistakes.
Comment thread common/componentstats.cpp Outdated
Comment on lines +288 to +297
try
{
m_table->set(keys[i], values[i]);
}
catch (const std::exception& e)
{
SWSS_LOG_WARN("ComponentStats[%s]: write failed for %s: %s",
m_component.c_str(), keys[i].c_str(), e.what());
}
}
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Redis write failure, the code only logs and continues using the existing DBConnector/Table. If the underlying Redis connection is broken (e.g., Redis restart), subsequent flushes will likely fail forever and stats will never recover. Consider resetting m_table/m_db and re-entering the connect-retry logic (or otherwise rebuilding the connection) after repeated write errors.

Copilot uses AI. Check for mistakes.
Comment thread tests/componentstats_ut.cpp Outdated
Comment on lines +195 to +201
// Wait for at least two flush intervals to give the writer thread a
// chance to re-flush. With dirty-tracking it must stay at "999".
std::this_thread::sleep_for(std::chrono::milliseconds(2500));

std::string value;
ASSERT_TRUE(tbl.hget("e", "X", value));
EXPECT_EQ("999", value);
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test can pass even if the writer thread never performs another flush after the sentinel write (it only sleeps and then reads Redis). To make it a stronger regression test for the "skip unchanged" path, consider adding a separate change on a different entity/metric and waiting for it to flush, while asserting the sentinel value remains unchanged.

Copilot uses AI. Check for mistakes.
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Hot path is not lock-free — despite the design-goal claim, every increment()/setValue() acquires m_mutex twice (once in getOrCreateEntity, once in getOrCreateCounter), even when the entity+metric already exist. This is the critical perf gap.
  • No final flush on shutdownstop() wakes the writer thread, which breaks out of the loop without flushing dirty counters. Data written just before teardown is silently lost.
  • Singleton registry leaks entries — expired weak_ptr slots are never pruned from the static registry map; long-lived processes cycling component names will accumulate dead entries.
  • Overall: solid Redis reconnect logic, clean API surface, good test coverage. The lock contention on the hot path needs fixing before this can deliver on the stated lock-free promise.

Comment thread common/componentstats.h Outdated
* increment() / setValue() update the counter with std::atomic ops.
* The first reference to a new entity or metric still takes m_mutex
* to insert into the maps.
* - Deferred DB connection: the DBConnector is created inside the writer
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment claims "Lock-free counter update: once an entity/metric pair has been seen, increment() / setValue() update the counter with std::atomic ops." — but the implementation in componentstats.cpp acquires m_mutex twice on every call (once in getOrCreateEntity, once in getOrCreateCounter), even when both already exist.

To actually deliver on this promise, increment()/setValue() need a lock-free fast path — e.g. cache the EntityStats& and Counter& at the call site and only fall back to the locked getOrCreate* on first use, or use a read-optimized pattern (shared_mutex with read-lock for lookups, write-lock only for inserts).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b21b74e. The hot path now goes through a single lookupOrCreate(entity, metric) that takes std::shared_mutex in shared_lock mode for the read-only lookup; concurrent increment() callers no longer serialize against each other. Only the first-ever insertion for a new (entity, metric) pair upgrades to an exclusive lock. The header doc and the design-goals section are updated to describe this accurately.

Comment thread common/componentstats.cpp Outdated
{
if (!isEnabled() || n == 0)
{
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getOrCreateEntity and getOrCreateCounter each take m_mutex. For the hot path where the entity and metric already exist, this means two full mutex lock/unlock cycles per increment() call. With 8+ threads hammering counters, this serializes the "lock-free" path.

Consider at minimum combining them into a single lock acquisition, or better yet using std::shared_mutex (read-lock for lookup, write-lock for insert) so concurrent reads don't contend.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b21b74e. getOrCreateEntity + getOrCreateCounter have been merged into a single lookupOrCreate() that takes the lock at most once. The structural mutex is now std::shared_mutex, so the lookup uses a shared_lock (concurrent readers proceed in parallel) and only the first-time insert path upgrades to unique_lock.

Comment thread common/componentstats.cpp Outdated
[this]{ return !m_running.load(std::memory_order_relaxed); });
if (!m_running.load(std::memory_order_relaxed))
{
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When m_running becomes false, the writer breaks out of the loop immediately without flushing dirty entities. Any counters updated since the last periodic flush are silently lost.

Add a final flush pass after the loop exits (while m_table is still valid) so that the last batch of dirty data lands in Redis before shutdown.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b21b74e. The flush logic is now factored into a private flushDirty() method, and writerThread() calls it one more time after the main loop exits — before joining — so any counter updates that landed after the last periodic flush are persisted to Redis on stop(). Best-effort: if Redis is already gone the final flush logs and moves on rather than blocking shutdown. New test FinalFlushOnShutdown covers this: uses intervalSec=60, dirties an entity, calls stop() immediately, and verifies the value lands in Redis.

Comment thread common/componentstats.cpp Outdated
{
const std::string key = toUpper(componentName);
const uint32_t effectiveInterval = intervalSec == 0 ? 1 : intervalSec;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expired weak_ptr entries are never removed from the registry. If a process creates and destroys many differently-named ComponentStats instances over its lifetime, this map grows without bound.

Easy fix: prune expired entries inside create() (e.g. iterate and erase expired slots before inserting), or erase the entry in the destructor under registryMutex().

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b21b74e. create() now calls pruneExpiredRegistryLocked() (under registryMutex()) before inserting a new slot, so expired weak_ptr entries are reaped on each create() call. New test RegistryPrunesExpiredInstances releases one instance, creates 10 unrelated ones to force a prune cycle, then re-creates the original name and asserts it gets a fresh pointer.

Comment thread common/componentstats.cpp Outdated
m_thread = std::make_unique<std::thread>(&ComponentStats::writerThread, this);
SWSS_LOG_NOTICE("ComponentStats[%s] initialized (db=%s, interval=%us)",
m_component.c_str(), m_dbName.c_str(), m_intervalSec);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intervalSec is clamped to 1 both here (line 87: m_intervalSec(intervalSec == 0 ? 1 : intervalSec)) and in create() (line 51: effectiveInterval = intervalSec == 0 ? 1 : intervalSec). The double-clamping is harmless but unnecessary — pick one location and document it there.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b21b74e. The ctor no longer clamps; create() is now the single chokepoint that ensures effectiveInterval >= 1, and the ctor's member-initializer just stores what it was passed.

Copy link
Copy Markdown

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The separation of getOrCreateEntity and getOrCreateCounter leads to inefficient double-locking on the hot path for new metrics.
  • Memory ordering for counter increments can be relaxed for better performance.
  • Connection and reconnection logic for Redis could be more robust by using an exponential backoff strategy instead of a fixed interval.
  • Individual Redis write failures are not retried, which could lead to data loss during transient network issues.

Comment thread common/componentstats.h Outdated

// Returns a stable reference. Safe to keep after the lock is released
// because std::map does not invalidate element references on insert.
// INVARIANT: no code path may erase from m_entities or from
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two separate getOrCreate functions are called sequentially in increment() and setValue(), each taking the main mutex. This results in two separate lock/unlock cycles on the hot path for every new entity/metric pair, which is inefficient. Consider combining these into a single getOrCreateCounter(entity, metric) method that takes the lock once, creates the entity if needed, and then creates the counter if needed, returning a reference to the counter. This would halve the locking overhead for new metrics.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b21b74e — both methods are now a single lookupOrCreate(entity, metric) taking at most one lock acquisition (shared_lock on the hit path, unique_lock on the slow path that has to insert).

Comment thread common/componentstats.cpp Outdated
{
std::lock_guard<std::mutex> lock(m_mutex);
auto it = e.metrics.find(metric);
if (it == e.metrics.end())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling getOrCreateEntity and then getOrCreateCounter separately releases the mutex between the two operations. While std::map reference stability prevents a crash, there's a potential but unlikely ABA-style problem if an entity were to be created, deleted (if a delete API is ever added), and re-created. Combining them into one locked operation as suggested for componentstats.h would make this pattern more robust and future-proof.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by the same change: lookupOrCreate() performs the entity lookup, the metric lookup, and (if needed) both inserts under a single lock acquisition. The lock is never released between the two halves, so there is no longer a window where another thread could race with an erase. As before, no erase API is exposed; if one is ever added, the layered structure stays consistent under a single unique_lock.

Comment thread common/componentstats.cpp Outdated
if (it == e.metrics.end())
{
// std::map::emplace constructs Counter in place; no move required.
it = e.metrics.emplace(std::piecewise_construct,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory order for this fetch_add can be relaxed to std::memory_order_relaxed. The subsequent fetch_add on the version with std::memory_order_release is sufficient to ensure that this counter update is visible to the writer thread before the version is updated. The release-acquire pair on the version number acts as the necessary memory barrier.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already the case — the existing code uses std::memory_order_relaxed on the counter fetch_add and std::memory_order_release on the version bump, exactly as you describe. The b21b74e refactor preserves that ordering; no change needed here.

Comment thread common/componentstats.cpp Outdated
}

void ComponentStats::writerThread()
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry logic here for the initial connection uses wait_for on the condition variable. This is fine, but it could be slightly more robust. If the Redis server is down during startup, this loop will retry every m_intervalSec. If m_intervalSec is large (e.g., 60s), the component might take a very long time to connect. A simple exponential backoff strategy (e.g., 1s, 2s, 4s, ... up to a max) for the retry delay would be more conventional and potentially recover faster without spamming on short intervals.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked. On SONiC the writer connects to the local in-container Redis, which is typically ready within seconds of container start, so a fixed 1 s retry has been adequate in practice. Switching to exponential backoff is a reasonable hardening but I'd prefer to defer it to a follow-up so this PR stays focused on the correctness fixes (lock-free path, final flush, gauge race, registry leak). Filed mentally as a follow-up; will open a tracking issue if you prefer.

Comment thread common/componentstats.cpp Outdated
values.push_back(std::move(row));
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a write failure occurs, there's no attempt to retry the failed write. The code logs the error and moves on. The consecutiveWriteFailures logic will eventually trigger a reconnect, but the specific updates that failed during this period are lost. A simple retry mechanism (e.g., retry the m_table->set call once or twice) for the individual write could improve data fidelity during transient network blips without adding much complexity.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked. The current behavior is: a write failure logs and is skipped, and after kMaxConsecutiveWriteFailures=3 failures the writer drops the connection and reconnects (lastVersions.clear() ensures every entity gets re-flushed against the new connection with its latest cumulative value). So no monotonic data point is permanently lost — the next successful flush carries the up-to-date value. Adding per-call retry would help only for very-narrow transient blips and would complicate the failure-handling path; I'd prefer to leave the current "log + reconnect" policy as is. Happy to add per-call retry if you feel strongly.

Comment thread common/componentstats.cpp Outdated
"dropping Redis connection and reconnecting",
m_component.c_str(), consecutiveWriteFailures);
m_table.reset();
m_db.reset();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic to re-establish the connection after failures is good. However, similar to the initial connection loop, it retries on the fixed m_intervalSec. Implementing an exponential backoff here could provide a more robust and responsive recovery behavior when Redis is down for an extended period.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer as the initial-connect case: deferring exponential backoff to a follow-up so this PR stays scoped to the correctness fixes. The reconnect path already triggers only after 3 consecutive failures, so the polling rate during outage is already bounded.

Copy link
Copy Markdown

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Overall: solid design — deferred DB connect, version-based dirty tracking, and stable std::map references are well thought out.
  • setValue race with increment: concurrent store + version bump is not atomic relative to fetch_add in increment() — an increment can land between store and the version bump, getting silently overwritten on the next setValue call from another thread. Needs compare_exchange loop or single-lock path for setValue.
  • No Redis key cleanup: entities written to Redis are never deleted. Long-running containers with transient entity names will leak stale keys in COUNTERS_DB indefinitely.
  • Test singleton leakage: ComponentStats::create returns process-lifetime singletons, so test order matters. Earlier tests' instances leak into later tests, risking flaky failures if component names collide.
  • Agree with R1/R2: the lock-free claim vs two-mutex-acquisition reality, missing final flush on shutdown, and registry leak are the highest-priority fixes.

Comment thread common/componentstats.cpp Outdated
// Release ordering pairs with the acquire in writerThread() so that counter
// writes become visible before the version bump.
e.version.fetch_add(1, std::memory_order_release);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setValue has a data race with concurrent increment: thread A calls c.value.store(value) at line 163, then thread B's fetch_add lands, then thread A bumps e.version. The writer sees the new version but reads the post-increment value, which is correct — however, a second setValue call can silently discard the increment that happened between the store and the version bump, because the caller of setValue intended to set an absolute value but didn't account for concurrent increments.

This is inherent to mixing gauge-style setValue and counter-style increment on the same metric from different threads. At minimum, document this caveat prominently. If you want stronger guarantees, setValue should be done under the same lock or use a CAS loop.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b21b74e, partially. setValue() now takes m_structMutex in exclusive mode for the store + version bump, so its own observation is atomic and concurrent setValue() calls are serialized. However — as you point out — an increment() whose shared_lock lookup races ahead of setValue's exclusive lock can still land its fetch_add outside the lock, so the writer may observe value + n rather than value. Fully serializing that would require dragging every counter mutation under the lock, which defeats the lock-free hot path. The header doc now explicitly states this caveat: mixing gauge setValue and counter increment on the same metric from concurrent threads is best-effort, and callers should pick one style per metric for deterministic gauge semantics. New test ConcurrentIncrementAndSetValueDoNotCrash exercises the mixed path as a UB / safety check.

Comment thread common/componentstats.cpp Outdated
// the version bump we're about to read.
uint64_t curVer = stats.version.load(std::memory_order_acquire);

auto vit = lastVersions.find(name);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entities written to Redis (m_table->set(...)) are never cleaned up. If a container creates transient entity names (e.g., session IDs, task IDs), stale keys will accumulate in COUNTERS_DB indefinitely. Consider:

  • Adding a removeEntity() API that erases from m_entities and calls m_table->del().
  • Or documenting that entity names must be stable/bounded and callers are responsible for cleanup.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked, declined for v1. The HLD (sonic-net/SONiC#2312, Reporting HLD §7.1 "No deletion in v1" and §12) explicitly states that entity GC is out of scope for the initial revision: stale rows live until the producer container restarts, and callers are expected to use stable / bounded entity names. The expected first consumer (SwssStats) has bounded entity cardinality = number of orchagent tables (~tens), so this is safe in practice. Adding removeEntity() is reserved for a future revision per the HLD's Open/Action items.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on this — wanted to lay out the actual design reason rather than just pointing at the HLD, since the HLD's "out of scope for v1" is a result of the reasoning below.

Why removeEntity() is non-trivial here

The lock-free hot path depends on std::map reference stability: lookupOrCreate() returns a Counter& / EntityStats&, releases the lock, and the caller then mutates atomics on those references concurrently:

cpp auto ref = lookupOrCreate(entity, metric); // shared_lock taken + released ref.counter.value.fetch_add(...); // lock-free atomic ref.entity.version.fetch_add(...);

If we expose removeEntity(), thread A could hold a Counter& while thread B erases the entity from the map — classic use-after-free.

Making erase safe requires one of:

  1. shared_ptr<Counter> per metric: adds an atomic refcount op on every hot-path access (cache-line ping-pong is the dominant cost in busy counter workloads).
  2. RCU / hazard pointers: significant code complexity and a quiescent-state mechanism.
  3. Take unique_lock for every mutation: kills the lock-free path entirely.

Why no current consumer needs it

All v1 / planned consumers have topology-bounded entity cardinality:

Consumer Entity Cardinality
SwssStats (sonic-swss#4516) orchagent table name ~tens, stable for container lifetime
gnmi (future) gNMI subscription path bounded by subscriber config
bmp (future) BMP peer address bounded by BGP topology

None of them mint short-lived entity names (no per-RPC IDs, per-session UUIDs, etc.), so growth is bounded and removeEntity() would not actually be exercised.

v1's "GC" mechanism

The Redis schema has no TTL; keys live for the container's process lifetime and are recreated on restart (HLD §10). Stale-key lifetime is bounded by container restart frequency, which is sufficient for the planned consumers.

When v2 would do this

If a future consumer needs truly short-lived entities (RPC IDs, session-scoped objects), the right move is to redesign the storage layer at the same time (e.g. switch to shared_ptr<Counter> or RCU) and document the new concurrency model. That's a non-trivial change deserving its own HLD revision, not a follow-up patch on top of this one.

So this is a deliberate design trade-off, not just deferred work. Happy to add the shared_ptr variant if you think the SwssStats / future consumers will need it sooner — let me know if you want me to spike that.

Comment thread tests/componentstats_ut.cpp Outdated
TEST(ComponentStats, IncrementBasic)
{
auto s = ComponentStats::create("UT1");
s->increment("entity_a", "SET");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each test creates a ComponentStats singleton keyed by component name, and these persist for the entire process lifetime (the registry holds weak_ptr, but the shared_ptr returned by create() is held until the test scope ends — and the writer thread may still be running). With unique names like UT1, UT2, etc., this works today, but:

  1. Each test leaks a writer thread until stop() or shared_ptr destruction — many tests = many threads alive simultaneously.
  2. If anyone reuses a component name across tests, they'll silently share state.

Consider calling s->stop() at the end of each in-memory test to clean up the writer thread promptly.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b21b74e. Every test that constructs a ComponentStats instance now calls s->stop() at the end, so the writer thread is joined and the singleton is released before the next test runs. Distinct component names (UT1UT_*) are still used so that even if a release races with the registry prune, two tests cannot accidentally share state.

yutongzhang-microsoft added a commit to yutongzhang-microsoft/sonic-swss-common that referenced this pull request May 27, 2026
… leak

This commit addresses banidoru's review comments on PR sonic-net#1180.

* Hot path is now effectively lock-free. lookupOrCreate() takes a
  shared_lock on m_structMutex for the read-only lookup and only
  upgrades to an exclusive lock when a new (entity, metric) pair must
  be inserted. Concurrent callers no longer serialize against each
  other, matching the stated design goal.

* getOrCreateEntity + getOrCreateCounter merged into a single
  lookupOrCreate(entity, metric) so the hot path acquires at most one
  lock instead of two.

* writerThread now performs a final flush pass after the main loop
  exits, so counter updates that landed after the last periodic flush
  are persisted to Redis on stop() rather than being silently dropped.

* setValue takes m_structMutex in exclusive mode for the store +
  version bump and the public doc explicitly warns that mixing
  setValue (gauge) and increment (counter) on the same metric from
  concurrent threads is best-effort -- callers must pick one style
  per metric for deterministic gauge semantics.

* Singleton registry no longer leaks: create() prunes expired
  weak_ptr slots before insertion so long-running processes that
  cycle component names do not accumulate dead entries.

* m_cv now pairs with a dedicated m_cvMutex (separate from the
  structural mutex) since std::shared_mutex is not compatible with
  std::condition_variable::wait_for.

* intervalSec is now clamped once in create() and the constructor
  trusts the caller, removing the duplicate clamp.

* Unit tests: every test that uses a writer thread now calls stop()
  to keep the per-component singletons from leaking writer threads
  across test cases. New tests cover the final-flush-on-shutdown
  guarantee, registry pruning of released instances, and the
  concurrent-increment-vs-setValue no-crash invariant.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

yutongzhang-microsoft added a commit to yutongzhang-microsoft/sonic-buildimage that referenced this pull request May 27, 2026
…review fixes)

Picks up review fixes from sonic-net/sonic-swss-common#1180 that
address banidoru's feedback:
* Lock-free hot path via shared_mutex + merged lookupOrCreate
* Final flush on shutdown so no in-flight updates are dropped
* Registry pruning of expired weak_ptr slots
* setValue exclusive lock + documented gauge/counter mixing caveat

Public API (increment/setValue/get/getAll/setEnabled/stop) is
unchanged, so sonic-swss SwssStats facade does not need to move.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
yutongzhang-microsoft added a commit to yutongzhang-microsoft/sonic-buildimage that referenced this pull request May 27, 2026
…+ sonic-swss-common#1180)

Recreated on top of fresh upstream master to drop the conflicts from
sonic-net#27456.

Repoints two submodules to forks carrying the refactored library +
thin-wrapper:

| submodule | now | source PR |
|---|---|---|
| src/sonic-swss-common | b21b74e (feature/component-stats) | sonic-net/sonic-swss-common#1180 |
| src/sonic-swss        | 3710025 (swss-stats-use-componentstats) | sonic-net/sonic-swss#4516 |

.gitmodules URLs are temporarily switched to https://github.com/yutongzhang-microsoft/...
so CI can resolve the non-upstream SHAs.

The swss-common SHA already includes banidoru's review fixes
(lock-free hot path via shared_mutex + merged lookupOrCreate, final
flush on shutdown, registry pruning, setValue exclusive lock + docs).

Supersedes sonic-net#27456.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@yutongzhang-microsoft
Copy link
Copy Markdown
Author

/azpw run

@mssonicbld
Copy link
Copy Markdown
Collaborator

⚠️ Notice: /azpw run only runs failed jobs now. If you want to trigger a whole pipline run, please rebase your branch or close and reopen the PR.
💡 Tip: You can also use /azpw retry to retry failed jobs directly.

Retrying failed(or canceled) jobs...

@mssonicbld
Copy link
Copy Markdown
Collaborator

Retrying failed(or canceled) stages in build 1123777:

✅Stage Build:

  • Job amd64: retried.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

yutongzhang-microsoft added a commit to sonic-net/SONiC that referenced this pull request Jun 1, 2026
[component-stats] Add HLD for SONiC component statistics

## What I'm doing

Adding a new High-Level Design document at `doc/component-stats/component-stats-hld.md` that specifies a reusable mechanism for exposing **service-level (control-plane software) counters** from SONiC containers.

The HLD introduces:

1. A new shared library `swss::ComponentStats` in `sonic-swss-common`.
2. A SWSS-specific facade `SwssStats` in `sonic-swss` built on top of that library, as the first consumer.

Counters are published to two sinks driven from a single in-process atomic snapshot:

- **`COUNTERS_DB`** — for parity with the existing Flex-Counter pipeline and for on-box diagnostic tooling (`redis-cli`, `show ... stats`).
- **Local OpenTelemetry (OTLP) Collector sidecar** — so the same counters can be forwarded to off-box telemetry systems (e.g. Geneva mdm) that consume OTLP.

## Why

SONiC already has dataplane counters (Flex-Counter / SAI), but no uniform mechanism for **service-level** counters such as orchagent task throughput, gNMI request rate, or BMP error counts. A naive per-container implementation would duplicate atomic counter management, dirty tracking, the writer thread, the Redis schema, and an OTLP exporter in every container — concurrency review, bug fixes, and on-the-wire schemas would all drift. This HLD specifies one reusable producer that any container can adopt with a ~100-line facade.

## Companion PRs

- `sonic-net/sonic-swss-common` [#1180](sonic-net/sonic-swss-common#1180) — `swss::ComponentStats` library + unit tests.
- `sonic-net/sonic-swss` [#4516](sonic-net/sonic-swss#4516) `SwssStats` thin facade over `ComponentStats` in `orchagent/`.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
yutongzhang-microsoft and others added 8 commits June 2, 2026 09:48
Extracts the generic mechanism behind sonic-swss#4434 into a container-agnostic library living in sonic-swss-common, so that orchagent, gnmi, bmp, telemetry etc. can all share the same statistics plumbing.

Key features:

- Lock-free hot path (atomic counters, memory_order_relaxed increments)

- Version-based dirty tracking (writer thread skips unchanged entities)

- Deferred DB connection with retry (safe to construct before Redis is up)

- Stable references via std::map (no rehash invalidation surprises)

- condition_variable-based clean shutdown (no 1s teardown delay)

- Per-component singleton keyed by name; Redis keys follow <COMPONENT>_STATS:<entity>

- Runtime enable/disable knob (setEnabled) so CONFIG_DB can toggle at runtime

Users define their own metric names; the library only provides storage and

periodic Redis flushing. A companion PR in sonic-swss will refactor SwssStats

to become a thin wrapper around ComponentStats.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Add Redis-backed tests covering the writer thread paths that the in-memory tests cannot reach:

- WriterFlushesToRedis: verifies counters land in Redis through the writer thread (covers successful DB connect, flush loop, and Table::set call)
- WriterSkipsUnchangedEntities: verifies version-based dirty tracking (covers the no-change skip branch)
- DestructorStopsThread: verifies the shared_ptr destructor cleanly joins the writer thread
- ConnectRetryOnBadDb: uses an unknown DB name to exercise the connect-retry catch block and confirm stop() promptly wakes the cv_wait

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: yutongzhang-microsoft <yutongzhang@microsoft.com>
…nter storage

- create(): warn (don't silently ignore) when later callers pass a different dbName/intervalSec; document 'first call wins' in the header.
- Drop unique_ptr<Counter> indirection: std::map::emplace constructs Counter in place and std::map never invalidates references on insert, so the extra heap allocation and dereference are unnecessary.
- Header doc: clarify that 'lock-free hot path' applies to the counter update only (entity/metric creation still takes m_mutex), document intervalSec=0 clamp and ASCII-only component names, and capture the reference-stability invariant so future contributors don't break it.
- setEnabled() doc: spell out that disabling preserves in-memory counters and previously-flushed Redis values.
- writerThread(): comment why the post-retry !m_table guard is defensive.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: yutongzhang-microsoft <yutongzhang@microsoft.com>
…hanged test, doc/log fixes

- componentstats.cpp: after N consecutive Redis write failures, drop the
  table/connector and re-enter the connect-retry loop so a Redis restart
  is recovered automatically instead of logging forever.
- componentstats_ut.cpp: strengthen WriterSkipsUnchangedEntities by
  dirtying a second entity and asserting it flushes -- this fails fast
  if the writer thread is dead, instead of silently passing.
- componentstats.h: clarify setEnabled() doc -- disabling does not
  abort an in-flight flush of already-dirty entities.
- Replace Unicode U+2026 (...) with ASCII ... in header doc and log msg.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
… leak

This commit addresses banidoru's review comments on PR sonic-net#1180.

* Hot path is now effectively lock-free. lookupOrCreate() takes a
  shared_lock on m_structMutex for the read-only lookup and only
  upgrades to an exclusive lock when a new (entity, metric) pair must
  be inserted. Concurrent callers no longer serialize against each
  other, matching the stated design goal.

* getOrCreateEntity + getOrCreateCounter merged into a single
  lookupOrCreate(entity, metric) so the hot path acquires at most one
  lock instead of two.

* writerThread now performs a final flush pass after the main loop
  exits, so counter updates that landed after the last periodic flush
  are persisted to Redis on stop() rather than being silently dropped.

* setValue takes m_structMutex in exclusive mode for the store +
  version bump and the public doc explicitly warns that mixing
  setValue (gauge) and increment (counter) on the same metric from
  concurrent threads is best-effort -- callers must pick one style
  per metric for deterministic gauge semantics.

* Singleton registry no longer leaks: create() prunes expired
  weak_ptr slots before insertion so long-running processes that
  cycle component names do not accumulate dead entries.

* m_cv now pairs with a dedicated m_cvMutex (separate from the
  structural mutex) since std::shared_mutex is not compatible with
  std::condition_variable::wait_for.

* intervalSec is now clamped once in create() and the constructor
  trusts the caller, removing the duplicate clamp.

* Unit tests: every test that uses a writer thread now calls stop()
  to keep the per-component singletons from leaking writer threads
  across test cases. New tests cover the final-flush-on-shutdown
  guarantee, registry pruning of released instances, and the
  concurrent-increment-vs-setValue no-crash invariant.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CI on the previous commit failed because swss-common's autotools build
uses -std=c++14 (see configure.ac CFLAGS_COMMON), and std::shared_mutex
is a C++17 addition. std::shared_timed_mutex is the C++14 ancestor that
provides the same shared/exclusive locking API; the timed-wait portion
is not used. Drop-in replacement, no behavior change.

Fixes the "'shared_mutex' is not a member of 'std'" errors at:
  common/componentstats.h:200
  common/componentstats.cpp:157 (and other shared_lock/unique_lock sites)

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…g batch

The previous version threw on the first failed m_table->set() within a
batch. Because lastVersions[name]=curVer is set in the snapshot phase
(under shared_lock, before any write happens), entities sequenced AFTER
the failing one would have their lastVersions slot set but never get
written -- the next periodic pass would then see them as "unchanged"
and silently skip them.

Switched to per-entity try/catch in the write loop: log + rollback
lastVersions[failed_key] and keep going. After the loop, if any failure
happened, throw a summary error so writerThread can still bump the
consecutiveWriteFailures counter and trigger reconnect if Redis is
fully gone. Restores the per-write isolation that the original loop
in writerThread had.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous version of this test asserted that re-creating a component
with the same name (after the first instance was released and a churn
of unrelated names triggered a prune cycle) would return a *different*
raw pointer. That assumption is wrong: glibc's allocator is free to
recycle the same address for the second instance, and in CI it does.
Log from the failing run shows firstPtr=0x...49a0 == s2.get()=0x...49a0.

Replaced the raw-pointer compare with a behavioural check: assert that
the re-created instance starts with a zero counter for an entity the
old instance had incremented to 7. If the registry had failed to prune
(and somehow kept the old instance alive), the new get() would return
7 instead of 0. This tests the actual singleton-recycling invariant
without relying on the allocator returning a fresh address.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

a114j0y pushed a commit to a114j0y/SONiC that referenced this pull request Jun 5, 2026
[component-stats] Add HLD for SONiC component statistics

## What I'm doing

Adding a new High-Level Design document at `doc/component-stats/component-stats-hld.md` that specifies a reusable mechanism for exposing **service-level (control-plane software) counters** from SONiC containers.

The HLD introduces:

1. A new shared library `swss::ComponentStats` in `sonic-swss-common`.
2. A SWSS-specific facade `SwssStats` in `sonic-swss` built on top of that library, as the first consumer.

Counters are published to two sinks driven from a single in-process atomic snapshot:

- **`COUNTERS_DB`** — for parity with the existing Flex-Counter pipeline and for on-box diagnostic tooling (`redis-cli`, `show ... stats`).
- **Local OpenTelemetry (OTLP) Collector sidecar** — so the same counters can be forwarded to off-box telemetry systems (e.g. Geneva mdm) that consume OTLP.

## Why

SONiC already has dataplane counters (Flex-Counter / SAI), but no uniform mechanism for **service-level** counters such as orchagent task throughput, gNMI request rate, or BMP error counts. A naive per-container implementation would duplicate atomic counter management, dirty tracking, the writer thread, the Redis schema, and an OTLP exporter in every container — concurrency review, bug fixes, and on-the-wire schemas would all drift. This HLD specifies one reusable producer that any container can adopt with a ~100-line facade.

## Companion PRs

- `sonic-net/sonic-swss-common` [sonic-net#1180](sonic-net/sonic-swss-common#1180) — `swss::ComponentStats` library + unit tests.
- `sonic-net/sonic-swss` [#4516](sonic-net/sonic-swss#4516) `SwssStats` thin facade over `ComponentStats` in `orchagent/`.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants