swsscommon: expose batched ZmqProducerStateTable APIs to SWIG#1188
Conversation
The batched overloads of ZmqProducerStateTable::set/del/send that take a std::vector<KeyOpFieldsValuesTuple> are unreachable from Go (and Python) today. SWIG generates the C trampolines and Go methods for them, but the parameter type std::vector<KeyOpFieldsValuesTuple> is not %template'd and SWIG cannot synthesize a default constructor for the underlying tuple element, so callers have no way to build the argument: the type exposes only Swigcptr(). Batched DEL works (vector<string> is templated) but batched SET and the mixed SET/DEL send() do not. Add three %inline shims (zmqProducerBatchedSet/Del/Send) that take parallel vectors of already-wrapped types (VectorString and FieldValuePairsList) and assemble the KeyOpFieldsValuesTuple vector on the C++ side, mirroring the parallel-vector pattern used by Table::pops. Size mismatches throw std::invalid_argument, which the existing SWIG exception bridge surfaces to callers as a recoverable panic. This unblocks bulk gNMI Set in sonic-gnmi (e.g. DASH route programming) which today emits one ZMQ message per key. Add Go tests in goext/swsscommon_test.go that drive each helper through a ZmqConsumerStateTable and verify the entries arrive on the consumer side. Per-entry semantic correctness is already covered by the existing C++ unit tests in tests/zmq_state_ut.cpp; the Go tests prove the SWIG bindings are reachable from Go. Signed-off-by: Dawei Huang <daweihuang@microsoft.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This pull request makes the batched swss::ZmqProducerStateTable APIs usable from SWIG-generated Go/Python bindings by adding a small C++ shim layer in the SWIG interface, avoiding SWIG’s inability to construct std::vector<KeyOpFieldsValuesTuple> arguments directly.
Changes:
- Added
%inlineshim helpers (zmqProducerBatchedSet/Del/Send) that accept parallel SWIG-friendly vectors and assemblestd::vector<KeyOpFieldsValuesTuple>in C++ before forwarding to existing batched APIs. - Added Go tests that exercise the new helpers end-to-end via
ZmqConsumerStateTable, including a size-mismatch negative case.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
pyext/swsscommon.i |
Adds SWIG-exposed C++ inline helpers to construct and call batched ZMQ producer operations from Go/Python. |
goext/swsscommon_test.go |
Adds Go coverage to verify the new SWIG glue is reachable and functions correctly for set/del/send batches. |
|
@hdwhdw could you check PR checker failures? |
|
/azpw run Azure.sonic-swss-common Re-running to confirm the |
|
Retrying failed(or canceled) jobs... |
|
Retrying failed(or canceled) stages in build 1113708: ✅Stage Build:
|
|
Re-run results: the previous This is independent of this PR. The change is a pure SWIG Cross-check: unrelated PR #1191 ( /azpw run Azure.sonic-swss-common |
prabhataravind
left a comment
There was a problem hiding this comment.
lgtm. @qiluo-msft to sign-off/merge as well.
|
@prsunny for viz as this is needed for 202605. |
sonic-net/sonic-swss-common#1188 merged to master as 3110aaa5. Update the submodule pointer from the PR branch tip d7330786 to the merge commit on master so the bump is reachable from sonic-buildimage master once this PR lands. sonic-gnmi submodule pointer is unchanged for now and will be bumped to the merge commit of sonic-net/sonic-gnmi#675 once that PR lands, at which point this PR can come out of draft. Related: sonic-net#27250 Signed-off-by: Dawei Huang <daweihuang@microsoft.com>
…675) * MixedDbClient: batch ZMQ Set/Del for multi-key gNMI updates Why: The current handleTableData loops DbSetTable/DbDelTable per key, so a single gNMI Set carrying N keys to a DASH_* table generates N ZMQ round-trips even though the messages all target the same producer. At the scale DASH route programming requires (10k-100k entries per update), the per-key overhead dominates wall-clock time. What: * Add DbBatchSetTable(table, []tableBatchEntry) and DbBatchDelTable( table, []string) on MixedDbClient. When the table is ZMQ-backed these collapse the batch into a single ZmqProducerBatchedSet / ZmqProducerBatchedDel call (one ZMQ message). For plain Table or non-ZMQ ProducerStateTable backends — where no batched C++ API exists — they fall back to the existing per-entry loop, so behavior for DASH_HA_* tables and ordinary CONFIG_DB writes is unchanged. * Wire the multi-key code paths in handleTableData to the new helpers: the opRemove branch and the multi-key JSON opAdd branch now build a slice and make one batched call. Single-key paths are untouched (batching one entry is pointless). * Wrap the new SWIG entry points with ZmqProducerBatched{Set,Del}Wrapper to convert C++ exceptions into Go errors, mirroring the existing ProducerStateTable wrappers. Tests: client_test.go adds: * TestZmqBatchedSetEndToEnd / TestZmqBatchedDelEndToEnd — round-trip N entries through a real ZmqProducerStateTable + ZmqConsumerStateTable and verify the consumer drains them all. * TestDbBatchSetTableFallsBackForPlainTable / ...EmptyAndSingle — verify the plainTableMap fallback path and empty/single-entry shortcuts. * TestHandleTableDataBatchesMultiKeyJSON / ...OpRemove — gomonkey- patched proof that handleTableData routes through DbBatchSetTable / DbBatchDelTable exactly once for multi-key payloads instead of N per-key calls. Related: * sonic-net/sonic-buildimage#27250 — tracking issue for the optimization. * Depends on sonic-net/sonic-swss-common#1188, which exposes ZmqProducerBatchedSet/Del/Send to Go via SWIG %inline helpers. Until that PR lands and a fresh libswsscommon artifact is published, sonic-gnmi CI on this branch will fail to build — the Go bindings for the new symbols do not exist in the current master swsscommon artifact. Signed-off-by: Dawei Huang <daweihuang@microsoft.com> * MixedDbClient tests: harden batch tests against global-state ordering Why: Copilot review on PR #675 flagged four genuine test-hygiene issues in the new batched-Set/Del tests. All four come from the tests touching package-level globals (RedisDbMap, zmqClientMap) without going through the existing helpers, which makes them order-dependent and racy with other tests that explicitly nil out RedisDbMap. What: * TestZmqBatchedSetEndToEnd / TestZmqBatchedDelEndToEnd: the cleanup loop freed every SWIG handle in zmqClientMap but left the (now dangling) entries in the map. A later test that ranges zmqClientMap would double-DeleteZmqClient the same pointer, which on CGO is a real crash. Delete the map entry alongside the handle. * TestHandleTableDataBatchesMultiKeyJSON / TestHandleTableDataBatchesOpRemove: replace the ad-hoc RedisDbMap[...] = redis.NewClient(...) with the existing setupMixedDbRedis(t, mapkey) helper, which save/restores RedisDbMap via Target2RedisDb. This stops the tests from panicking with 'assignment to entry in nil map' when they run after any test that sets RedisDbMap = nil (e.g. the TestUseRedis* family at lines ~490/521/537/553). No production code changes; behavior of the batched Set/Del path is unchanged. Signed-off-by: Dawei Huang <daweihuang@microsoft.com> * ci: install libyang3 runtime alongside libyang 1.x sonic-swss-common master switched to libyang3 (>= 3.12.2) in commit 1431c3d3 ('port to libyang3 with libyang1 fallback') and the published libswsscommon Debian artifact now hard-depends on the libyang3 package. The sonic-gnmi pipeline only downloads libyang 1.0 from common_libs, so 'dpkg -i libswsscommon_1.0.0_amd64.deb' fails with: dpkg: dependency problems prevent configuration of libswsscommon: libswsscommon depends on libyang3 (>= 3.12.2); however: Package libyang3 is not installed. This breaks every sonic-gnmi PR build from #1188 onward. Add libyang3_*.deb to the download pattern. libyang 1.x is still needed because sonic-mgmt-common/cvl/internal/util/util.go cgo-links '-lyang' against the v1 headers; the two runtimes coexist as distinct SONAMEs. Carried in this PR because it is also the unblock for #675's own CI. Signed-off-by: Dawei Huang <daweihuang@microsoft.com> * MixedDbClient tests: drop bogus zmqClientMap nuke loop from new batch tests The new TestZmqBatched{Set,Del}EndToEnd tests construct their ZmqClient directly with swsscommon.NewZmqClient(zmqAddr), so they do not register anything in the package-level zmqClientMap (only getZmqClientByAddress does). The cleanup loop that ranged over zmqClientMap and called DeleteZmqClient was therefore freeing pointers that belong to *other* tests. In source order, TestZmqReconnect (~line 657) and TestRemoveZmqClient (~line 823) run first; both already do the same broken 'for _, c := range zmqClientMap { DeleteZmqClient(c) }' pattern but without deleting from the map. By the time the new batch tests run, zmqClientMap still holds entries that point at already-freed C++ objects. Iterating those entries triggered: double free or corruption (out) SIGABRT: abort signal arrived during cgo execution ...DeleteZmqClient... client_test.go:2173 in CI as soon as the build actually linked successfully (build 1125898). The previous commit (5b770fb) added delete(zmqClientMap, k) on Copilot's advice, which fixes the in-loop iterator-mutation hazard but does not help here because the bad entries are inserted by prior tests, not by ours. The right fix for the new tests is to not touch zmqClientMap at all — the explicit DeleteZmqClient(zmqClient) is sufficient. Pre-existing tests are left alone; their map-nuke pattern is latent and only matters if any later test also iterates zmqClientMap. After this change none do. Signed-off-by: Dawei Huang <daweihuang@microsoft.com> * MixedDbClient: log batch size when sending batched ZMQ messages Adds a V(2) log line in DbBatchSetTable and DbBatchDelTable on the ZMQ-backed path so the actual batch size landing in a single ZMQ message is visible when debugging. Matches the verbosity used by the surrounding 'Create ZmqProducerStateTable' logs and stays quiet at default verbosity. Signed-off-by: Dawei Huang <daweihuang@microsoft.com> --------- Signed-off-by: Dawei Huang <daweihuang@microsoft.com>
Why
The batched overloads of
ZmqProducerStateTable::set/del/sendthat take astd::vector<KeyOpFieldsValuesTuple>are unreachable from the Go bindings today (and Python is in the same boat). SWIG happily generates the C trampolines and Go methods (e.g.SwigcptrZmqProducerStateTable.Set__SWIG_3), but the parameter typestd::vector<KeyOpFieldsValuesTuple>is not%template'd and SWIG cannot synthesize a default constructor for the opaque underlying tuple element, so callers have no way to build the argument — the type exposes onlySwigcptr().Verified empirically in a clean trixie + SWIG 4.3 + Go 1.24 environment:
%template?Add/Get/SizeFieldValuePairs(vector<FieldValueTuple>)VectorString(vector<string>)KeyOpFieldsValuesQueue(deque<KeyOpFieldsValuesTuple>)vector<KeyOpFieldsValuesTuple>(parameter of batchedset/send)Swigcptr()Adding a single
%templateis not enough: even with the list templated, the elementstd::tuple<…>itself has no usable constructor —std_vector.idoes not synthesize constructors for opaque tuple elements (std_deque.iis more permissive, which is why the existingKeyOpFieldsValuesQueueworks). So the cleanest fix is a small inline shim layer.Batched
del(vector<string>)already works today becauseVectorStringis templated.What
Add three
%inlinehelpers inpyext/swsscommon.i(also picked up by the goext build via the symlink):zmqProducerBatchedSet(p, keys, fvss)— all entries get op=SETzmqProducerBatchedDel(p, keys)— all entries get op=DELzmqProducerBatchedSend(p, keys, ops, fvss)— caller supplies per-entry op for mixed batchesEach helper takes parallel vectors of already-wrapped types (
VectorStringandFieldValuePairsList) and assembles thevector<KeyOpFieldsValuesTuple>on the C++ side before forwarding to the existing batched API. This mirrors the parallel-vector patternTable::popsalready uses. Size mismatches throwstd::invalid_argument, which the existing SWIG exception bridge surfaces to callers as a recoverable panic.Why we want this now
This unblocks bulk gNMI Set in sonic-gnmi (sonic-net/sonic-buildimage#27250 — bulk DASH route programming). Today
MixedDbClient.handleTableDataemits one ZMQ message per key; with these helpers it can ship N keys in a single message. Follow-up PR in sonic-gnmi to land afterwards.Tests
goext/swsscommon_test.go(TestZmqProducerBatchedSet/Del/Sendand aSetSizeMismatchnegative case) drive each helper through a ZmqConsumerStateTable and verify the entries arrive on the consumer side.tests/zmq_state_ut.cpp; the Go tests exist to prove the SWIG glue is reachable from Go.sonic-slave-trixie:./configure --disable-yangmodules && make -j && make install— clean build, no warnings introduced./tests/tests --gtest_filter='ZmqConsumerStateTable*'— all 3 ZMQ unit tests passcd goext && go test -v -run TestZmqProducer— all 4 new tests passRelated