sonic-swss-common: port to libyang3, fix zmq use-after-free, and CI libnexthopgroup#1199
Closed
securely1g wants to merge 3 commits into
Closed
sonic-swss-common: port to libyang3, fix zmq use-after-free, and CI libnexthopgroup#1199securely1g wants to merge 3 commits into
securely1g wants to merge 3 commits into
Conversation
…te CI
Port the YANG schema-walking code in defaultvalueprovider.{h,cpp} from
libyang1 to libyang3. sonic-swss-common is a submodule whose libyang
dependency is pinned by the parent project (sonic-buildimage), so the
source needs to compile against either version. Use LY_ARRAY_COUNT
(defined only by libyang3) as a preprocessor sentinel:
* Trivial spelling/accessor differences (node types, ->child,
leaf->dflt extraction, module->data layout, ly_ctx_destroy,
ly_ctx_load_module) are hidden behind SWSS_LYS_* macros so the
call sites stay near-identical to the libyang1 source.
* Structurally divergent code (list-key extraction via lysc_is_key,
leaf-list default values, the ly_ctx_new return-by-out-param
signature) still needs explicit #ifdef branches.
SWIG's preprocessor cannot see <libyang/libyang.h>, so when it parses
defaultvalueprovider.h it always takes the libyang1 branch of the
version shim and generates wrappers that fail to compile against
libyang3 headers. `%ignore swss::DefaultValueHelper` in
pyext/swsscommon.i suppresses those wrappers (the helper is internal —
no Python or Go caller uses it) and keeps the generated bindings
version-agnostic.
CI changes wired up for the libyang3 packages:
* Switch the .deb download/install patterns from libyang1
(libyang_1.0*, libyang-*_1.0*, libyang-cpp_*, python3-yang_*) to
libyang3 (libyang3_*, libyang-dev_3*) across azure-pipelines.yml
and the build/sairedis/swss/test-docker templates so CI pulls the
libyang3 artifacts the parent project produces.
* In the amd64/ubuntu-22.04 host job, replace the python3-libyang
.deb install with `pip install libyang==3.3.0`. The bookworm-
built python3-libyang_3.1.0 .deb pins python3 (>= 3.11~,
<< 3.12) so it only installs on systems with Python 3.11; the
vmImage ships Python 3.10. PyPI's libyang ships sdist only, so
pip has to build the CFFI extension — use --no-build-isolation
with apt's python3-cffi to sidestep a cffi version-mismatch
Exception that jammy's pip 22.0.2 build-isolation env triggers
(newer cffi in the overlay vs. the system /usr/lib/python3/
dist-packages/_cffi_backend.so that still wins on sys.path).
3.3.0 is the last 3.x release and includes patches we
upstreamed against 3.1.0.
* Drop the amd64/ubuntu-20.04 job, which has been pinned to
`condition: false` (i.e. never runs) for some time.
Signed-off-by: Brad House <brad@brad-house.com>
ZmqServer dispatches incoming messages on its poll thread by looking up
a ZmqMessageHandler pointer in a (db, table) map populated by
ZmqConsumerStateTable's constructor via registerMessageHandler() and
calling handler->handleReceivedData(). There was no symmetric
removeMessageHandler and no ZmqConsumerStateTable destructor — so when
a consumer went out of scope, the map kept the now-dangling pointer.
That bites in test code that declares its ZmqServer before its
ZmqConsumerStateTable (the natural order — server has to exist to be
passed by reference into the consumer's constructor). Reverse-order
destruction then runs the consumer's destructor first, which destroys
m_selectableEvent and closes the underlying eventfd, while the server's
mqPollThread is still running and may be partway through a dispatch
that ends in m_selectableEvent.notify(). write() on the closed fd
returns EBADF, SelectableEvent::notify() throws
runtime_error("write failed"), the exception escapes the poll thread,
and std::terminate aborts the process — visible in CI as a crash
attributed to whichever test happened to be RUNing next
(ZmqServerLazzyBind in practice, since gtest prints [ RUN ] before the
test body executes).
The fix moves the handler map and its mutex out of ZmqServer into a
small, separately-allocated ZmqHandlerRegistry class. The server and
every registered consumer co-own a std::shared_ptr<ZmqHandlerRegistry>,
so:
* The mutex is held across handler dispatch, so the consumer's
destructor (which calls removeHandler() on the registry) blocks
until any in-flight callback into it has returned — after which
its other members can safely destruct.
* The registry survives as long as either the server OR any
registered handler still holds a reference. Destruction order
between server and consumer no longer matters: whichever goes
away last leaves a live registry behind for the other's
destructor to use without touching freed memory.
The server's public surface — registerMessageHandler() and the new
removeMessageHandler() — is unchanged in signature; both delegate to
the registry. A new (internal) ZmqServer::getHandlerRegistry()
accessor returns the shared_ptr so ZmqMessageHandler implementations
can co-own the registry.
ZmqConsumerStateTable now caches its db name as std::string at
construction and stores std::shared_ptr<ZmqHandlerRegistry> instead of
ZmqServer&, so its destructor never dereferences the DBConnector or
the ZmqServer it was created with — only the shared registry.
Handler callbacks still must not themselves call register/removeHandler
on the registry (would self-deadlock); ZmqConsumerStateTable::
handleReceivedData only enqueues and notifies, so the restriction is
harmless in tree. Documented on the ZmqHandlerRegistry class.
Signed-off-by: Brad House <bhouse@nexthop.ai>
sonic-swss commit e6c482ca ("...") added -lnexthopgroup to fpmsyncd's
link line, expecting libnexthopgroup_*.deb to already be installed
when the swss build runs. sonic-swss-common's downstream "Compile
sonic swss" stage installs everything it needs from the common-lib
artifact, but its pattern list never picked up the new package, so
the link now fails with `cannot find -lnexthopgroup`. Add the
libnexthopgroup pattern next to the other Azure.sonic-buildimage.
common_libs entries in build-swss-template.yml so dpkg picks it up
in the existing install step.
Signed-off-by: Brad House <brad@brad-house.com>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Combines the following PRs into a single submission:
All original commits and authorship preserved.
Signed-off-by: Baba securely1g@users.noreply.github.com