From 21c4cfd59d987930c7ca59ec3ecfa0c680f98a0a Mon Sep 17 00:00:00 2001 From: Yutong Zhang Date: Wed, 27 May 2026 11:01:16 +0800 Subject: [PATCH 1/3] [bmp] Pipeline and batch redis writes from openbmpd Profiling test_sessions_flapping[500] on a SONiC DUT showed openbmpd consuming ~15% of OnCPU during BGP scale events. The flame graph stack was dominated by: openbmpd -> libswsscommon -> libhiredis -> libc -> vmlinux syscall Root cause is in Server/src/RedisManager.cpp::WriteBMPTable: every BMP route update went through a freshly constructed swss::Table and a synchronous HSET round-trip to redis. With ~500 BGP sessions flapping simultaneously and thousands of routes per session, this translated to millions of small synchronous redis round-trips during a single convergence window, all on the BMP message processing thread. This commit changes RedisManager to use a swss::RedisPipeline plus buffered swss::Table writers cached per-table-name: * Setup() builds a single shared RedisPipeline (size 1024) bound to stateDb_. * WriteBMPTable() looks up (or lazily creates) a buffered Table for the table name and calls set() into the pipeline buffer instead of constructing a new Table per call. * A new FlushBMPTables() method drains the pipeline; callers invoke it at natural batch boundaries. Flush points in MsgBusImpl_redis: * update_unicastPrefix(): after processing all rib[i] entries, flush once so that a BMP UPDATE carrying N routes results in ~1 pipelined round-trip instead of N synchronous HSETs. The DEL path (RemoveEntityFromBMPTable) already batched its keys; it now flushes the SET pipeline before issuing the DEL so SET-then-DEL ordering on the same key is preserved. * update_Peer(): flush immediately after the single neighbor write because peer up/down events are infrequent and consumers expect them to be visible right away. * ExitRedisManager(): drain any still-buffered updates so observed state isn't lost on shutdown. Behavioral notes: * Thread safety: each MsgBusImpl_redis (and therefore each RedisManager + pipeline) is constructed per BMP client connection in openbmpd's per-client-thread model. The shared pipeline is not used across threads, so no additional locking is introduced. * Visibility: BMP state writes that previously hit redis synchronously now wait until the enclosing BMP message finishes processing before being flushed. In the absence of new BMP messages, the pipeline still auto-flushes when its 1024-entry buffer fills. * The CONFIG_DB-driven enabledTables_ gate is preserved; disabled tables short-circuit before any Table object is touched. This is the openbmpd-side half of the work to reduce DUT-side observer cost in BGP scale convergence tests; a follow-up will look at tightening the default of BMP|table|bgp_rib_out_table for deployments that do not consume the outbound RIB. Signed-off-by: Yutong Zhang Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- Server/src/RedisManager.cpp | 63 +++++++++++++++++++++++++-- Server/src/RedisManager.h | 22 ++++++++++ Server/src/redis/MsgBusImpl_redis.cpp | 14 ++++++ 3 files changed, 96 insertions(+), 3 deletions(-) diff --git a/Server/src/RedisManager.cpp b/Server/src/RedisManager.cpp index 0f7e2dd..41c537d 100644 --- a/Server/src/RedisManager.cpp +++ b/Server/src/RedisManager.cpp @@ -9,6 +9,15 @@ #include "RedisManager.h" +/** + * Default size of the shared RedisPipeline buffer used for BMP state + * writes. Set high enough that a typical BMP UPDATE worth of route + * entries fits in a single round-trip, but bounded so that the pipeline + * auto-flushes if a producer goes quiet without the message-level + * flush hook ever firing. + */ +static constexpr size_t BMP_PIPELINE_SIZE = 1024; + /*********************************************************************//** * Constructor for class @@ -37,8 +46,13 @@ void RedisManager::Setup(Logger *logPtr) { stateDb_ = std::make_shared(BMP_DB_NAME, 0, false); separator_ = swss::SonicDBConfig::getSeparator(BMP_DB_NAME); -} + // Build a single shared pipeline backed by the same logical DB. All + // per-table buffered Table writers below share this pipeline so that a + // flush() sends every queued HSET in a single batch instead of one + // round-trip per route entry. + pipeline_ = std::make_shared(stateDb_.get(), BMP_PIPELINE_SIZE); +} /** @@ -51,6 +65,31 @@ std::string RedisManager::GetKeySeparator() { } +/** + * Lookup-or-create a buffered Table writer for the given table name. + * + * The Table is constructed with buffered=true and bound to pipeline_, so + * each set() call enqueues the underlying HSET on the pipeline rather + * than synchronously talking to redis. The actual round-trip happens on + * FlushBMPTables() (or when the pipeline buffer fills). + * + * Returns nullptr if the table is not enabled via CONFIG_DB. + */ +swss::Table* RedisManager::GetOrCreateBufferedTable(const std::string& table) { + if (enabledTables_.find(table) == enabledTables_.end()) { + return nullptr; + } + auto it = bufferedTables_.find(table); + if (it == bufferedTables_.end()) { + auto t = std::make_unique(pipeline_.get(), table, /*buffered=*/true); + auto raw = t.get(); + bufferedTables_.emplace(table, std::move(t)); + return raw; + } + return it->second.get(); +} + + /** * WriteBMPTable * @@ -60,11 +99,11 @@ std::string RedisManager::GetKeySeparator() { */ bool RedisManager::WriteBMPTable(const std::string& table, const std::vector& keys, const std::vector fieldValues) { - if (enabledTables_.find(table) == enabledTables_.end()) { + swss::Table* stateBMPTable = GetOrCreateBufferedTable(table); + if (stateBMPTable == nullptr) { DEBUG("RedisManager %s is disabled", table.c_str()); return false; } - std::unique_ptr stateBMPTable = std::make_unique(stateDb_.get(), table); std::ostringstream oss; for (const auto& key : keys) { oss << key << separator_; @@ -89,17 +128,35 @@ bool RedisManager::RemoveEntityFromBMPTable(const std::vector& keys for (const auto& key : keys) { DEBUG("RedisManager RemoveEntityFromBMPTable key = %s", key.c_str()); } + // Flush any buffered SETs first so that a later DEL on the same key + // cannot be reordered before its preceding SET. + FlushBMPTables(); stateDb_->del(keys); return true; } +/** + * FlushBMPTables - flush all buffered table writers' pending operations. + */ +void RedisManager::FlushBMPTables() { + for (auto& kv : bufferedTables_) { + if (kv.second) { + kv.second->flush(); + } + } +} + + /** * ExitRedisManager * * \param [in] N/A */ void RedisManager::ExitRedisManager() { + // Best-effort: drain anything still buffered so we don't lose state + // updates that have already been observed by openbmpd. + FlushBMPTables(); exit_ = true; } diff --git a/Server/src/RedisManager.h b/Server/src/RedisManager.h index 05a437a..80211bd 100644 --- a/Server/src/RedisManager.h +++ b/Server/src/RedisManager.h @@ -13,11 +13,14 @@ #include #include #include +#include #include #include #include +#include #include +#include #include #include #include @@ -118,6 +121,16 @@ class RedisManager { */ bool RemoveEntityFromBMPTable(const std::vector& keys); + /** + * FlushBMPTables - flush any pending buffered HSET operations on the + * pipelined per-table writers. Should be called at natural batch + * boundaries (e.g. after a full BMP UPDATE message is processed) so + * that buffered redis writes are actually delivered. + * + * \param [in] N/A + */ + void FlushBMPTables(); + /** * Get Key separator for deletion * @@ -126,7 +139,16 @@ class RedisManager { std::string GetKeySeparator(); private: + /** + * Lookup (or lazily create) a buffered swss::Table writer for the + * given table name, backed by pipeline_. Returns nullptr if the table + * is not in enabledTables_. + */ + swss::Table* GetOrCreateBufferedTable(const std::string& table); + std::shared_ptr stateDb_; + std::shared_ptr pipeline_; + std::unordered_map> bufferedTables_; std::string separator_; Logger *logger; std::unordered_set enabledTables_; diff --git a/Server/src/redis/MsgBusImpl_redis.cpp b/Server/src/redis/MsgBusImpl_redis.cpp index 1e00203..c1cb977 100644 --- a/Server/src/redis/MsgBusImpl_redis.cpp +++ b/Server/src/redis/MsgBusImpl_redis.cpp @@ -96,6 +96,10 @@ void MsgBusImpl_redis::update_Peer(obj_bgp_peer &peer, obj_peer_up_event *up, ob } redisMgr_.WriteBMPTable(BMP_TABLE_NEI, keys, fieldValues); + // Peer state transitions are infrequent and BMP consumers expect to + // see them immediately; flush right away rather than waiting for the + // pipeline buffer to fill. + redisMgr_.FlushBMPTables(); } @@ -183,7 +187,17 @@ void MsgBusImpl_redis::update_unicastPrefix(obj_bgp_peer &peer, vector } if (!del_keys.empty()) { + // RemoveEntityFromBMPTable already flushes any buffered SETs + // before issuing the DEL, so SET-then-DEL ordering on the same + // key is preserved. redisMgr_.RemoveEntityFromBMPTable(del_keys); + } else { + // ADD path accumulated buffered set() calls across all rib[i] + // entries above; emit them in a single pipelined round-trip + // instead of one HSET-per-route. Without this flush the buffered + // updates would only be sent when the pipeline buffer fills, + // which can delay BMP state visibility for slow producers. + redisMgr_.FlushBMPTables(); } } From e8aea09a2a96ef1b1069aace90b3d938a80254f7 Mon Sep 17 00:00:00 2001 From: Yutong Zhang Date: Wed, 27 May 2026 05:51:37 +0000 Subject: [PATCH 2/3] Fix Reset paths to flush pipeline and harden shutdown flush Address self-review findings on the RedisPipeline batching change: 1. ResetBMPTable now flushes the pipeline before reading keys via stateDb_. getKeys() runs on stateDb_'s connection and cannot see SETs still buffered in pipeline_ (which is on an independent connection), so without the flush those keys are missed by the DEL list and survive the reset. The buffered SETs then land on redis after the DEL, leaving stale entries. Flushing first makes the reset see the full key set and guarantees SET-before-DEL ordering for any key that gets re-added afterwards. 2. ExitRedisManager wraps FlushBMPTables in try/catch. This path is reached from ~MsgBusImpl_redis; letting a redis I/O error escape into the destructor chain could call std::terminate during unwinding. swss::RedisPipeline::~RedisPipeline already swallows exceptions for the same reason - mirror that here. Signed-off-by: Yutong Zhang Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- Server/src/RedisManager.cpp | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/Server/src/RedisManager.cpp b/Server/src/RedisManager.cpp index 41c537d..8372f99 100644 --- a/Server/src/RedisManager.cpp +++ b/Server/src/RedisManager.cpp @@ -155,8 +155,17 @@ void RedisManager::FlushBMPTables() { */ void RedisManager::ExitRedisManager() { // Best-effort: drain anything still buffered so we don't lose state - // updates that have already been observed by openbmpd. - FlushBMPTables(); + // updates that have already been observed by openbmpd. This runs from + // ~MsgBusImpl_redis, so we must not let a redis I/O error escape into + // the destructor chain - mirror swss::RedisPipeline's own dtor + // pattern and swallow exceptions here. + try { + FlushBMPTables(); + } catch (const std::exception& e) { + LOG_INFO("RedisManager ExitRedisManager flush failed: %s", e.what()); + } catch (...) { + LOG_INFO("RedisManager ExitRedisManager flush failed with unknown exception"); + } exit_ = true; } @@ -198,6 +207,12 @@ bool RedisManager::InitBMPConfig() { */ void RedisManager::ResetBMPTable(const std::string & table) { + // Drain the pipeline before reading the current key set: getKeys() runs + // on stateDb_'s connection and would otherwise miss any SETs that are + // still buffered in pipeline_ (which uses an independent connection), + // leaving stale entries in redis after the reset completes. + FlushBMPTables(); + std::unique_ptr stateBMPTable = std::make_unique(stateDb_.get(), table); std::vector keys; stateBMPTable->getKeys(keys); From d6c1a269be025328fe06b34bfd405504cc1f9fb0 Mon Sep 17 00:00:00 2001 From: Yutong Zhang Date: Wed, 27 May 2026 05:56:12 +0000 Subject: [PATCH 3/3] Defensively clear bufferedTables_ before rebuilding pipeline in Setup bufferedTables_ stores swss::Table writers that hold raw pointers into pipeline_ (obtained via pipeline_.get()). Today Setup() is only invoked once per RedisManager - from MsgBusImpl_redis's constructor - so this isn't a live bug. But if a future caller ever re-invokes Setup(), the shared_ptr reassignment of pipeline_ destroys the old pipeline while the existing Table entries still reference it, turning the next WriteBMPTable into a use-after-free. Clear bufferedTables_ at the top of Setup() so the contract "Setup rebuilds all redis state" actually holds. Signed-off-by: Yutong Zhang Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- Server/src/RedisManager.cpp | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/Server/src/RedisManager.cpp b/Server/src/RedisManager.cpp index 8372f99..de31d67 100644 --- a/Server/src/RedisManager.cpp +++ b/Server/src/RedisManager.cpp @@ -44,6 +44,13 @@ void RedisManager::Setup(Logger *logPtr) { swss::SonicDBConfig::initialize(); } + // Drop any pre-existing buffered Table writers before we replace the + // pipeline they reference. Today Setup() is only called once from + // MsgBusImpl_redis's constructor, but bufferedTables_ holds raw + // pointers into pipeline_ and would dangle if a second Setup() ever + // reassigned pipeline_ without clearing the map first. + bufferedTables_.clear(); + stateDb_ = std::make_shared(BMP_DB_NAME, 0, false); separator_ = swss::SonicDBConfig::getSeparator(BMP_DB_NAME);