[bmp] Pipeline and batch redis writes from openbmpd by yutongzhang-microsoft · Pull Request #36 · sonic-net/sonic-bmp

yutongzhang-microsoft · 2026-05-27T03:02:02Z

What this PR does

Profiling test_sessions_flapping[500] on a SONiC DUT showed that openbmpd consumed ~15% of OnCPU during BGP convergence, with the entire stack dominated by:

openbmpd -> libswsscommon -> libhiredis -> libc -> vmlinux syscall

Root cause is in Server/src/RedisManager.cpp::WriteBMPTable: every BMP route update goes through a freshly constructed swss::Table and a synchronous HSET round-trip to redis. With ~500 BGP sessions flapping simultaneously and thousands of routes per session, this becomes millions of small synchronous redis round-trips on the BMP message processing thread during a single convergence window.

This PR replaces the per-call swss::Table + synchronous set() with a shared swss::RedisPipeline and buffered swss::Table writers cached per table name. Flush points are chosen at natural batch boundaries (end of each BMP UPDATE message; immediately for peer state transitions).

Changes

`Server/src/RedisManager.{h,cpp}`

New shared swss::RedisPipeline pipeline_ (size 1024) built in Setup(), bound to stateDb_.
New std::unordered_map<std::string, std::unique_ptr<swss::Table>> bufferedTables_ caches one buffered Table per enabled table name (BGP_NEIGHBOR_TABLE / BGP_RIB_IN_TABLE / BGP_RIB_OUT_TABLE).
WriteBMPTable() calls GetOrCreateBufferedTable() instead of constructing a fresh Table; each set() enqueues into the pipeline buffer rather than triggering a synchronous HSET.
New FlushBMPTables() drains the pipeline; callers invoke it at message boundaries.
RemoveEntityFromBMPTable() flushes the pipeline before issuing the batched DEL, preserving SET-then-DEL ordering on the same key.
ExitRedisManager() drains the pipeline so updates already observed by openbmpd are not lost at shutdown.

`Server/src/redis/MsgBusImpl_redis.cpp`

update_unicastPrefix(): after the per-rib-entry loop, call FlushBMPTables() once on the ADD path; the DEL path is unchanged because RemoveEntityFromBMPTable() flushes internally.
update_Peer(): flush immediately after the single neighbor write, because peer up/down events are infrequent and consumers expect them to be visible right away.

Why this is safe

Concern	Mitigation
Thread safety	`MsgBusImpl_redis` (and therefore `RedisManager` + its pipeline) is constructed per BMP client connection in openbmpd's per-client-thread model. The shared pipeline never crosses threads, so no new locking is needed.
Visibility delay	Writes are flushed at every BMP UPDATE boundary, every peer event, every DEL, and on shutdown. The pipeline also auto-flushes when its 1024-entry buffer fills. Net effect: BMP state visibility is bounded by the duration of a single BMP UPDATE message, which is already a natural granularity for BMP consumers.
CONFIG_DB gating	`enabledTables_` is still consulted before any `Table` object is touched; disabled tables short-circuit early just as before.
DEL ordering	`RemoveEntityFromBMPTable()` now flushes the pipeline before issuing the batched `DEL` on the same connection, so a SET followed by a DEL of the same key cannot be reordered.
ResetAllTables / ResetBMPTable	These existing reset paths operate directly on `stateDb_` (not via the pipeline) and are only triggered on FRR reconnect, which is rare; they remain functionally unchanged.

Expected impact

For a BMP UPDATE carrying N route entries, redis round-trips drop from N synchronous HSETs to ~1 pipelined batch. In the test_sessions_flapping[500] profile this is the change targeting the ~15% openbmpd OnCPU share; combined with a separate effort to tighten the default of BMP|table|bgp_rib_out_table for deployments that do not consume the outbound RIB, that share is expected to drop substantially.

Test plan

Build openbmpd against current swss-common (which already provides swss::RedisPipeline and the buffered swss::Table ctor used here).
Run an existing BMP scale scenario; verify BGP_NEIGHBOR_TABLE / BGP_RIB_IN_TABLE / BGP_RIB_OUT_TABLE entries appear in BMP_STATE_DB exactly as before (same keys, same fields), and that DEL events still remove the corresponding entries.
Verify that on FRR disconnect + reconnect, ResetAllTables still clears state.
Re-profile test_sessions_flapping[500]; expect the openbmpd -> libswsscommon -> libhiredis stack to shrink substantially.

Profiling test_sessions_flapping[500] on a SONiC DUT showed openbmpd consuming ~15% of OnCPU during BGP scale events. The flame graph stack was dominated by: openbmpd -> libswsscommon -> libhiredis -> libc -> vmlinux syscall Root cause is in Server/src/RedisManager.cpp::WriteBMPTable: every BMP route update went through a freshly constructed swss::Table and a synchronous HSET round-trip to redis. With ~500 BGP sessions flapping simultaneously and thousands of routes per session, this translated to millions of small synchronous redis round-trips during a single convergence window, all on the BMP message processing thread. This commit changes RedisManager to use a swss::RedisPipeline plus buffered swss::Table writers cached per-table-name: * Setup() builds a single shared RedisPipeline (size 1024) bound to stateDb_. * WriteBMPTable() looks up (or lazily creates) a buffered Table for the table name and calls set() into the pipeline buffer instead of constructing a new Table per call. * A new FlushBMPTables() method drains the pipeline; callers invoke it at natural batch boundaries. Flush points in MsgBusImpl_redis: * update_unicastPrefix(): after processing all rib[i] entries, flush once so that a BMP UPDATE carrying N routes results in ~1 pipelined round-trip instead of N synchronous HSETs. The DEL path (RemoveEntityFromBMPTable) already batched its keys; it now flushes the SET pipeline before issuing the DEL so SET-then-DEL ordering on the same key is preserved. * update_Peer(): flush immediately after the single neighbor write because peer up/down events are infrequent and consumers expect them to be visible right away. * ExitRedisManager(): drain any still-buffered updates so observed state isn't lost on shutdown. Behavioral notes: * Thread safety: each MsgBusImpl_redis (and therefore each RedisManager + pipeline) is constructed per BMP client connection in openbmpd's per-client-thread model. The shared pipeline is not used across threads, so no additional locking is introduced. * Visibility: BMP state writes that previously hit redis synchronously now wait until the enclosing BMP message finishes processing before being flushed. In the absence of new BMP messages, the pipeline still auto-flushes when its 1024-entry buffer fills. * The CONFIG_DB-driven enabledTables_ gate is preserved; disabled tables short-circuit before any Table object is touched. This is the openbmpd-side half of the work to reduce DUT-side observer cost in BGP scale convergence tests; a follow-up will look at tightening the default of BMP|table|bgp_rib_out_table for deployments that do not consume the outbound RIB. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mssonicbld · 2026-05-27T03:02:10Z

/azp run

azure-pipelines · 2026-05-27T03:02:20Z

Azure Pipelines successfully started running 1 pipeline(s).

yutongzhang-microsoft · 2026-05-27T05:47:19Z

/azpw run

mssonicbld · 2026-05-27T05:47:22Z

⚠️ Notice: /azpw run only runs failed jobs now. If you want to trigger a whole pipline run, please rebase your branch or close and reopen the PR.
💡 Tip: You can also use /azpw retry to retry failed jobs directly.

Retrying failed(or canceled) jobs...

mssonicbld · 2026-05-27T05:47:23Z

No Azure DevOps builds found for #36.

Address self-review findings on the RedisPipeline batching change: 1. ResetBMPTable now flushes the pipeline before reading keys via stateDb_. getKeys() runs on stateDb_'s connection and cannot see SETs still buffered in pipeline_ (which is on an independent connection), so without the flush those keys are missed by the DEL list and survive the reset. The buffered SETs then land on redis after the DEL, leaving stale entries. Flushing first makes the reset see the full key set and guarantees SET-before-DEL ordering for any key that gets re-added afterwards. 2. ExitRedisManager wraps FlushBMPTables in try/catch. This path is reached from ~MsgBusImpl_redis; letting a redis I/O error escape into the destructor chain could call std::terminate during unwinding. swss::RedisPipeline::~RedisPipeline already swallows exceptions for the same reason - mirror that here. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mssonicbld · 2026-05-27T05:51:46Z

/azp run

azure-pipelines · 2026-05-27T05:51:56Z

Azure Pipelines successfully started running 1 pipeline(s).

bufferedTables_ stores swss::Table writers that hold raw pointers into pipeline_ (obtained via pipeline_.get()). Today Setup() is only invoked once per RedisManager - from MsgBusImpl_redis's constructor - so this isn't a live bug. But if a future caller ever re-invokes Setup(), the shared_ptr reassignment of pipeline_ destroys the old pipeline while the existing Table entries still reference it, turning the next WriteBMPTable into a use-after-free. Clear bufferedTables_ at the top of Setup() so the contract "Setup rebuilds all redis state" actually holds. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mssonicbld · 2026-05-27T05:56:21Z

/azp run

azure-pipelines · 2026-05-27T05:56:30Z

Azure Pipelines successfully started running 1 pipeline(s).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bmp] Pipeline and batch redis writes from openbmpd#36

[bmp] Pipeline and batch redis writes from openbmpd#36
yutongzhang-microsoft wants to merge 3 commits into
sonic-net:masterfrom
yutongzhang-microsoft:optimize/redis-pipeline-batching

yutongzhang-microsoft commented May 27, 2026

Uh oh!

mssonicbld commented May 27, 2026

Uh oh!

azure-pipelines Bot commented May 27, 2026

Uh oh!

yutongzhang-microsoft commented May 27, 2026

Uh oh!

mssonicbld commented May 27, 2026

Uh oh!

mssonicbld commented May 27, 2026

Uh oh!

mssonicbld commented May 27, 2026

Uh oh!

azure-pipelines Bot commented May 27, 2026

Uh oh!

mssonicbld commented May 27, 2026

Uh oh!

azure-pipelines Bot commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yutongzhang-microsoft commented May 27, 2026

What this PR does

Changes

Server/src/RedisManager.{h,cpp}

Server/src/redis/MsgBusImpl_redis.cpp

Why this is safe

Expected impact

Test plan

Uh oh!

mssonicbld commented May 27, 2026

Uh oh!

azure-pipelines Bot commented May 27, 2026

Uh oh!

yutongzhang-microsoft commented May 27, 2026

Uh oh!

mssonicbld commented May 27, 2026

Uh oh!

mssonicbld commented May 27, 2026

Uh oh!

mssonicbld commented May 27, 2026

Uh oh!

azure-pipelines Bot commented May 27, 2026

Uh oh!

mssonicbld commented May 27, 2026

Uh oh!

azure-pipelines Bot commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`Server/src/RedisManager.{h,cpp}`

`Server/src/redis/MsgBusImpl_redis.cpp`