[swss, swss-common, swss-sairedis] voq system port implementation#1
Open
vganesan-nokia wants to merge 7 commits into
Open
[swss, swss-common, swss-sairedis] voq system port implementation#1vganesan-nokia wants to merge 7 commits into
vganesan-nokia wants to merge 7 commits into
Conversation
added 7 commits
August 9, 2020 16:08
…, sairedis submodules
vganesan-nokia
pushed a commit
that referenced
this pull request
Dec 10, 2020
This update brings in the following commits. 86c1108 Enable arm architecture to build in addition to amd64 (sonic-net#37) 4acb2c3 fix bugs and enhance Transformer (sonic-net#35) 49e5a22 ygot related enhancements and fixes (sonic-net#34) 51224de Fix ietf yang search path for cvl schema builds (sonic-net#32) 3c6cdb3 CVL Changes sonic-net#8: 'must' and 'when' expression evaluation (sonic-net#31) dabf231 CVL Changes sonic-net#7: 'leafref' evaluation (sonic-net#28) 6f9535f CVL Changes sonic-net#6: Customized Xpath Engine integration (sonic-net#27) 5e2466b DB-Layer fixes/enhancements (sonic-net#26) 9a27302 CVL Changes sonic-net#4: Implementation of new CVL APIs (sonic-net#22) dbf1093 Translib support for authorization, yang versioning and Delete flag (sonic-net#21) 80f369e CVL Changes sonic-net#5: YParser enhancement (sonic-net#23) 904ce18 CVL Changes #3: Multi-db instance support (sonic-net#20) 9d24a34 CVL Changes #2: YValidator infra changes for evaluating xpath expression (sonic-net#19) f3fc40f CVL Changes #1: Initial CVL code reorganization and common infra changes (sonic-net#18) 4922601 Bulk and RPC API support in translib (sonic-net#16) 1d730df RFC7895 yang module library implementation (sonic-net#15)
vganesan-nokia
pushed a commit
that referenced
this pull request
Jan 17, 2021
…ebian (sonic-net#6114) Sonic devices advertise meaningful system description along with Debian package information. before the fix: ------------- admin@sonic:~$ show lldp neighbors ------------------------------------------------------------------------------- LLDP neighbors: ------------------------------------------------------------------------------- Interface: Ethernet0, via: LLDP, RID: 3, Time: 0 day, 16:36:30 SysName: sonic SysDescr: Debian GNU/Linux 9 (stretch) Linux 4.9.0-11-2-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64 ------------------------------------------------------------------------------- After the fix: root@sonic:~# show lldp neighbors Ethernet16 ------------------------------------------------------------------------------- LLDP neighbors: ------------------------------------------------------------------------------- Interface: Ethernet16, via: LLDP, RID: 10, Time: 0 day, 00:01:00 SysName: sonic SysDescr: SONiC Software Version: SONiC.sonic_upstream_1.0_daily_201130_1501_62-dirty-20201130.203529 - HwSku: Accton-AS7816-64X - Distribution: Debian 10.6 - Kernel: 4.19.0-9-2-amd64 ------------------------------------------------------------------------------- Signed-off-by: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>
vganesan-nokia
pushed a commit
that referenced
this pull request
Nov 30, 2021
Allow mellanox platform to build and successfully switch packets in Debian 11 Upgraded * Mellanox SDK * Mellanox Hardware Management * Mellanox Firmware * Mellanox Kernel Patches Adjusted build system to support host system running bullseye and dockers running buster.
vganesan-nokia
pushed a commit
that referenced
this pull request
Nov 30, 2021
* Make neccesary changed to mellanox platform code to build on Debian 11 * Revert use of backported kernel to build mft and elect to only build kernel module under bullseye
vganesan-nokia
pushed a commit
that referenced
this pull request
Nov 30, 2021
Submodule update for sonic-linkmgrd Incorporates: c11a576 (2021-11-22 09:38:46) [ci]: show code coverage in azure pipeline (sonic-net#4) 4ceb01d (2021-11-18 20:24:20) Fix MUX toggling issue (#1) d640527 (2021-11-12 22:31:44) [ci]: fix artifact download b9f247d (2021-11-12 22:31:44) [ci]: use native arm64/armhf build 3059122 (2021-09-27 11:32:23) [linkgrd] Add Missing Apache License Header
vganesan-nokia
pushed a commit
that referenced
this pull request
Nov 29, 2022
…net#10291) #### Why I did it Fix issue: Non compliant leaf list in config_db schema: sonic-net#9801 #### How I did it The basic flow of DPB is like: 1. Transfer config db json value to YANG json value, name it “yangIn” 2. Validate “yangIn” by libyang 3. Generate a YANG json value to represent the target configuration, name it “yangTarget” 4. Do diff between “yangIn” and “yangTarget” 5. Apply the diff to CONFIG DB json and save it back to DB The fix: • For step #1, If value of a leaf-list field string type, transfer it to a list by splitting it with “,” the purpose here is to make step#2 happy. We also need to save <table_name>.<key>.<field_name> to a set named “leaf_list_with_string_value_set”. • For step#5, loop “leaf_list_with_string_value_set” and change those fields back to a string. #### How to verify it 1. Manual test 2. Changed sample config DB and unit test passed
vganesan-nokia
pushed a commit
that referenced
this pull request
Jul 24, 2023
- Why I did it To improve ASIC FW upgrade logging and have information about the cause of FW update failure in the log. - How I did it Added syslog logger support In case the FW update has failed the update tool will give the cause of the failure in the output in the last line, starting with "Fail". When running the tool, in case of a failed update, we will parse the output to retrieve the cause and log it. Device #1: ---------- Device Type: ConnectX6DX Part Number: MCX623106AN-CDA_Ax Description: ConnectX-6 Dx EN adapter card; 100GbE; Dual-port QSFP56; PCIe 4.0/3.0 x16; PSID: MT_0000000359 PCI Device Name: /dev/mst/mt4125_pciconf0 Base GUID: 0c42a103007d22d4 Base MAC: 0c42a17d22d4 Versions: Current Available FW 22.32.0498 22.32.0498 PXE 3.6.0500 3.6.0500 UEFI 14.25.0015 14.25.0015 Status: Forced update required --------- Found 1 device(s) requiring firmware update... Device #1: Updating FW ... FSMST_INITIALIZE - OK Writing Boot image component - OK Fail : The Digest in the signature is wrong - How to verify it mlnx-fw-upgrade.sh --upgrade
vganesan-nokia
pushed a commit
that referenced
this pull request
Dec 17, 2025
#### Why I did it If one python wheel is already installed inside slave container, it will not install again. Below is a sample log: ``` sed: -e expression #1, char 11: extra characters after command WARNING: The directory '/var/user/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag. Processing ./target/python-wheels/bookworm/sonic_yang_models-1.0-py3-none-any.whl sonic-yang-models is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel. WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. [notice] A new release of pip is available: 24.2 -> 25.3 [notice] To update, run: python3 -m pip install --upgrade pip Build end time: Wed Dec 3 22:53:07 UTC 2025 Elapsed time: 0h 0m 1s ``` However, we expect to reinstall the python wheel for target `$(PYTHON_WHEELS_PATH)/%-install` ##### Work item tracking - Microsoft ADO **(number only)**: #### How I did it Update slave.mk to enasure force install the python wheel. #### How to verify it After this change, local build will successfully force install the python wheel. See new logs: ``` sed: -e expression #1, char 11: extra characters after command WARNING: The directory '/var/qiluo/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag. Processing ./target/python-wheels/bookworm/sonic_yang_models-1.0-py3-none-any.whl Installing collected packages: sonic-yang-models Attempting uninstall: sonic-yang-models Found existing installation: sonic-yang-models 1.0 Uninstalling sonic-yang-models-1.0: Successfully uninstalled sonic-yang-models-1.0 Successfully installed sonic-yang-models-1.0 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. [notice] A new release of pip is available: 24.2 -> 25.3 [notice] To update, run: python3 -m pip install --upgrade pip Build end time: Wed Dec 3 23:59:31 UTC 2025 ```
vganesan-nokia
pushed a commit
that referenced
this pull request
Dec 17, 2025
…logs The `imklog` plugin of rsyslog collects the kernel logs from `/dev/kmsg` and enqueues it to the syslog. With `CONFIG_PRINTK_TIME` the kernel messages are by default prefixed with the elapsed time since boot. The `imklog` plugin parsing these messages have a few options such as to keep the timestamps as such or to interpret and adjust the syslog's reported time accordingly. The rsylog release `8.2312.0` has fixes in interpreting these timestamps, leading to the change in behavior observed in sonic-net#24386. https://salsa.debian.org/debian/rsyslog/-/blob/debian/8.2504.0-1/ChangeLog?ref_type=tags#L619 To restore the earlier behavior or retaining the kernel reported elapsed time, disable `KlogParseKernelTimestamp` as this leads to removal of timestamp from kernel messages and enable `KlogKeepKernelTimestamp` explicitly. The later is required as the default is now to discard the kernel timestamp. With this change, the logs retain the kernel timestamp: root@sonic:~# cat /var/log/syslog | grep "sonic.*kernel:" | head -n 3 2025 Nov 4 05:15:14.918946 sonic NOTICE kernel: [ 0.000000] Linux version 6.12.41+deb13-sonic-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.41-1 (2025-08-12) 2025 Nov 4 05:15:14.919533 sonic INFO kernel: [ 0.000000] Command line: BOOT_IMAGE=/image-trixie.0-dirty-20251102.122837/boot/vmlinuz-6.12.41+deb13-sonic-amd64 root=UUID=ac0b6826-f8a3-461f-a8ff-701df60d90b6 rw console=tty0 console=ttyS0,115200n8 quiet processor.max_cstate=1 intel_idle.max_cstate=0 net.ifnames=0 biosdevname=0 loop=image-trixie.0-dirty-20251102.122837/fs.squashfs loopfstype=squashfs apparmor=1 security=apparmor varlog_size=4096 usbcore.autosuspend=-1 intel_iommu=off modprobe.blacklist=gpio_ich,i2c-ismt,i2c_ismt,i2c-i801,i2c_i801 crashkernel=0M-2G:256M,2G-4G:320M,4G-8G:384M,8G-:448M acpi_no_watchdog 2025 Nov 4 05:15:14.919536 sonic INFO kernel: [ 0.000000] BIOS-provided physical RAM map: root@sonic:~# cat /var/log/syslog | grep "sonic.*kernel:" | tail -n 3 2025 Nov 4 05:17:26.831607 sonic WARNING kernel: [ 143.527486] PDDF_LED set_status_led: Set [FANTRAY_LED;1] color[green] 2025 Nov 4 05:17:26.912442 sonic WARNING kernel: [ 143.607086] PDDF_LED set_status_led: Set [FANTRAY_LED;2] color[green] 2025 Nov 4 05:20:32.499634 sonic WARNING kernel: [ 329.195319] PDDF_LED set_status_led: Set [SYS_LED;0] color[amber] root@sonic:~# Signed-off-by: Ramasamy Chandramouli <rachandr@celestica.com> Co-authored-by: Ramasamy Chandramouli <rachandr@celestica.com>
vganesan-nokia
pushed a commit
that referenced
this pull request
Apr 17, 2026
…net#25643) * [build] Add build timing report and dependency analysis tools Add three scripts for build performance instrumentation: - scripts/build-timing-report.sh: Parse per-package timing from build logs (HEADER/FOOTER timestamps), generate sorted duration table, phase breakdown, parallelism timeline, and CSV export. - scripts/build-dep-graph.py: Parse rules/*.mk dependency graph, compute critical path, fan-out/fan-in bottleneck analysis, and generate DOT/JSON output for visualization. - scripts/build-resource-monitor.sh: Sample CPU, memory, disk I/O, and Docker container count during builds for resource utilization analysis. Add "make build-report" target to slave.mk that runs the timing report and dependency analysis after a build completes. Example output from a VS build on 24-core/30GB machine: - 210 packages built in 53m wall time (173m CPU) - Max concurrency: 5 (with SONIC_CONFIG_BUILD_JOBS=4) - Critical path: 14 packages deep (libnl -> libswsscommon -> utilities) - Top bottleneck: LIBSWSSCOMMON with 48 downstream dependents Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com> * Address Copilot review: fix 17 bugs in build analysis scripts - Use free -m with division instead of free -g to avoid rounding (#1) - Add = and ?= to Makefile dependency regex patterns (#2, sonic-net#7) - CPU calculation now uses /proc/stat delta (two reads) (#3, sonic-net#14) - Fix misleading 'critical path estimate' comment (sonic-net#4) - Fix parallelism timeline comment (60s not 10s) (sonic-net#5) - Include after-relationship packages in fan stats (sonic-net#6) - Guard disk I/O division by zero when INTERVAL<=1 (sonic-net#8) - Remove unused elapsed_line variable (sonic-net#9) - Remove redundant LIBSWSSCOMMON_DBG check (sonic-net#10) - Remove active_make_jobs from CSV header comment (sonic-net#11) - Wire up _RDEPENDS parsing to build reverse deps (sonic-net#12) - Remove unnecessary 'if v' filter on rdeps JSON (sonic-net#13) - Remove unused REPORT_FORMAT parameter (sonic-net#15) - Add cycle detection to critical path algorithm (sonic-net#16) - Add execute permission check for companion scripts (sonic-net#17) Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com> --------- Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com> Co-authored-by: Rustiqly <rustiqly@users.noreply.github.com>
vganesan-nokia
pushed a commit
that referenced
this pull request
Apr 17, 2026
…dating udevd rules (sonic-net#26343) - Why I did it On SONiC SmartSwitch platforms with DPUs, systemd-udevd crashes with SIGABRT on every reboot when DPU firmware initialization is slow. During the initramfs boot phase, a standalone systemd-udevd daemon is started to handle device discovery. If DPU firmware takes longer than the 60-second udevadm settle timeout (BlueField-3 DPUs can take 120 seconds each in the failure case when they are stuck), the initramfs cannot stop this udevd before switch_root. The stale process survives into the real system but is never chrooted into the overlayfs root, leaving it with a broken filesystem view. When dpu-udev-manager.sh writes udev rules, the stale udevd detects the change and crashes on an assertion in systemd's chase() path resolution (assert(path_is_absolute(p)) at chase.c:648), because dir_fd_is_root() returns false for a process whose root still points to the initramfs rootfs rather than the overlayfs. This triggers a systemd issue : systemd/systemd#29559 which maintainers doesn't consider as a bug from systemd side. Raising this fix for our usecase. Core was generated by `/usr/lib/systemd/systemd-udevd --daemon --resolve-names=never'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) bt #0 0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f29fe7a1cc2 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007f29fe78a4ac in abort () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007f29fea50c11 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so sonic-net#4 0x00007f29feb94a8b in chase () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so sonic-net#5 0x00007f29feb956e2 in chase_and_opendir () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so sonic-net#6 0x00007f29feb9a609 in conf_files_list_strv () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so sonic-net#7 0x00007f29fea913e8 in config_get_stats_by_path () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so sonic-net#8 0x0000559f295519cf in ?? () sonic-net#9 0x0000559f29553a77 in ?? () sonic-net#10 0x00007f29fec36055 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so sonic-net#11 0x00007f29fec3668d in sd_event_dispatch () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so sonic-net#12 0x00007f29fec394a8 in sd_event_run () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so sonic-net#13 0x00007f29fec396c7 in sd_event_loop () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so sonic-net#14 0x0000559f29545820 in ?? () sonic-net#15 0x00007f29fe78bca8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 sonic-net#16 0x00007f29fe78bd65 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 sonic-net#17 0x0000559f29545c51 in ?? () - How I did it Added a kill_stale_udevd() function to dpu-udev-manager.sh that runs before writing the udev rules. It identifies the systemd-managed udevd PID via systemctl show, then kills any other systemd-udevd --daemon process that doesn't match -- these are leftover initramfs instances. If no stale process exists (e.g. DPUs are healthy and the initramfs udevd exited cleanly), the function is a no-op. - How to verify it Deploy the image on a SmartSwitch with DPUs in a state where firmware initialization times out (>60s per DPU) by stopping image installation before firmware install step Reboot the switch Verify no new systemd-udevd coredumps in /var/core/ Verify the stale process was killed: journalctl -b 0 | grep dpu-udev-manager should show killing stale initramfs udevd PID (systemd udevd is PID ) Verify systemd-udevd.service is healthy: systemctl status systemd-udevd should show active (running) Verify DPU udev rules were written: cat /etc/udev/rules.d/92-midplane-intf.rules should contain the DPU interface naming rules Signed-off-by: Hemanth Kumar Tirupati <tirupatihemanthkumar@gmail.com>
vganesan-nokia
pushed a commit
that referenced
this pull request
May 20, 2026
…26549) - Why I Did It Bug: Powering DPUs off at runtime via dpuctl leaves stale MST PCI device entries on the host. Subsequent mst status / mlxfwmanager calls then hang for ~50 seconds while MST tries to talk to the now-absent DPU PCI devices. Root cause: MST was started globally at boot via the mlnx-fw-manager.service systemd unit (ExecStartPre=/usr/bin/mst start --with_i2cdev) and left running. So MST always saw the DPU PCI devices, even after they were powered off, and any tool that walked them stalled. This was already fixed on 202511 — see sonic-net#25575 and sonic-net#26131. This PR ports the same fix to master/202605. - How I Did It Move MST lifecycle out of the global systemd unit and into the firmware-update paths that actually need MST. The host then runs without MST loaded, so DPU power-off no longer leaves stale entries that hang mst/mlxfwmanager. mlnx-fw-manager.service — Drop ExecStartPre=/usr/bin/mst start --with_i2cdev and ExecStop=/usr/bin/mst stop, so the unit no longer leaves MST running globally. Add an ExecCondition that skips the unit during SONIC_BOOT_TYPE=fastfast. mellanox_fw_manager (ASIC firmware upgrade) — FirmwareCoordinator now owns the MST lifecycle: _start_mst() runs mst start --with_i2cdev before the upgrade. _stop_mst() runs mst stop in a finally block, so MST is always stopped on exit. New ignore_mst_start_failure flag (CLI: -m / --ignore-mst-start-failure in main.py) lets callers continue when MST can't start — used by the BlueField installer where DPUs may be off. BlueField installer (install.sh.j2) — Remove the explicit chroot ... mst start and call mlnx-fw-manager -m --nosyslog --verbose so install doesn't abort if MST start fails. DPU FPGA upgrade (sonic_platform/component.py) — ComponentFPGADPU._install_firmware() now runs cpldupdate inside a new _mst_context() context manager that does mst start before and mst stop after, even on failure. ComponentCPLD.__get_mst_device() is also rewritten to use asic_detect.sh -p instead of scanning the MST device path, so it no longer depends on MST being globally loaded. Unit tests — Updated for the new __get_mst_device contract, the new ignore_mst_start_failure plumbing, and _start_mst/_stop_mst patching in coordinator tests. - How to Verify It 1. MST no longer loaded globally; DPU power-off doesn't hang host tools ✅ # dpuctl dpu-power-off --all ... # time mst status MST modules: ------------ MST PCI module is not loaded MST PCI configuration module is not loaded PCI Devices: ------------ 06:00.0 real 0m0.091s user 0m0.041s sys 0m0.068s # time mlxfwmanager Querying Mellanox devices firmware ... Device #1: ---------- Device Type: Spectrum3 ... real 0m0.176s user 0m0.013s sys 0m0.134s 2. ASIC firmware upgrade at boot still works ✅ mlnx-fw-manager starts MST, performs the upgrade, and stops MST. Verified end-to-end at boot time. 3. DPU FPGA upgrade still works ✅ cpldupdate runs inside the new _mst_context; MST is started/stopped only for the duration of the update. 4. BlueField installer ✅ Installer-time firmware upgrade succeeds with mlnx-fw-manager -m, including when DPUs are powered off and MST can't start. 5. Unit tests ✅ pytest platform/mellanox/mlnx-platform-api/tests/test_component.py pytest platform/mellanox/platform-utils/tests/ Signed-off-by: Yizhen Zhang <evazha@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
(1) Picking up sub-module changes done for implementation of voq system port implementation.