Skip to content

[Mellanox] Add phcsync warm reboot gate to avoid sync during warm reboot#11

Open
zili11720 wants to merge 6 commits into
masterfrom
avoid_clock_sync_during_warm_reboot
Open

[Mellanox] Add phcsync warm reboot gate to avoid sync during warm reboot#11
zili11720 wants to merge 6 commits into
masterfrom
avoid_clock_sync_during_warm_reboot

Conversation

@zili11720
Copy link
Copy Markdown
Owner

@zili11720 zili11720 commented Mar 10, 2026

This PR is related to:
zili11720/sonic-sairedis#2
zili11720/sonic-swss-common#2

Together, these PRs introduce an event-driven mechanism that starts and stops phcsync.sh during a warm reboot.

Order of merge:

  1. [Mellanox] Add waitWarmBootStarted to restartWaiter sonic-swss-common#2
  2. [Mellanox] Add phcsync warm reboot gate to avoid sync during warm reboot #11
  3. [Mellanox] Activate phcsync gate to prevent clock sync during warm reboot sonic-sairedis#2

Why I did it

This fixes an issue where clock synchronization accessed the ASIC clock concurrently with the warm reboot ISSU process.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Add phcsync_warm_reboot_gate.py to start/stop phcsync.sh during warm reboot.

How to verify it

Which release branch to backport (provide reason below if selected)

  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Zili Bombach <zbombach@nvidia.com>
#!/usr/bin/env python3
#
# SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES
# Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a new file, please change to 2026 only

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread platform/mellanox/docker-syncd-mlnx/supervisord.conf.j2
sys.exit(1)

PHCSYNC_SCRIPT = os.path.join(os.path.dirname(os.path.abspath(__file__)), "phcsync.sh")
WARM_BOOT_STARTED_TIMEOUT_SEC = 86400 # 24h
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls comment that this number is the default and it will anyway take it

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

def main():
syslog.openlog("phcsync-warm-reboot-gate", syslog.LOG_PID)

if not os.path.isfile(PHCSYNC_SCRIPT) or not os.access(PHCSYNC_SCRIPT, os.X_OK):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the file does not exist, the script will be restarted over and over again. check if the parameter of autorestart is relevant here or could cause an issue

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autorestart = unexpected has a default of 3 retries


child_proc = None

def shutdown(signum, frame):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually define functions outside of the main func

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved function

log("Warm reboot started, pausing phcsync")
try:
os.kill(pid, signal.SIGSTOP)
except ProcessLookupError:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pla check if the above line can fail because of another issue. if no, keep it that way.
if yes - consider changing the except to more general

log("phcsync process gone, exiting", syslog.LOG_WARNING)
return 2

if not RestartWaiter.waitWarmBootDone(maxWaitSec=WARM_BOOT_DONE_TIMEOUT_SEC):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add comment on the fact you want to continue listening to warm reboot

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

try:
os.kill(pid, signal.SIGCONT)
except ProcessLookupError:
log("phcsync process gone while paused, exiting", syslog.LOG_WARNING)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the log here does not indicate the real state. should it be 'while trying to resume'?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed log.

return 2
log("Warm reboot done, resumed phcsync")

return 2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you return not 0, maybe it's worth to add log indicating that there was an issue

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a log, not just a comment.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done



if __name__ == "__main__":
sys.exit(main())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why sys.exit?
why not just start the main?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys.exit(main()) ensures the program exits with the return code produced by main(). If we just called main(), Python would ignore the returned value and always exit with status 0, which would hide error conditions

@zili11720 zili11720 requested a review from noaOrMlnx March 15, 2026 14:24
Signed-off-by: Zili Bombach <zbombach@nvidia.com>
Signed-off-by: Zili Bombach <zbombach@nvidia.com>
Signed-off-by: Zili Bombach <zbombach@nvidia.com>
Signed-off-by: Zili Bombach <zbombach@nvidia.com>
Signed-off-by: Zili Bombach <zbombach@nvidia.com>
@zili11720 zili11720 force-pushed the avoid_clock_sync_during_warm_reboot branch from 9387ebc to 281a64e Compare March 23, 2026 13:32
zili11720 pushed a commit that referenced this pull request Apr 19, 2026
…dating udevd rules (sonic-net#26343)

- Why I did it
On SONiC SmartSwitch platforms with DPUs, systemd-udevd crashes with SIGABRT on every reboot when DPU firmware initialization is slow. During the initramfs boot phase, a standalone systemd-udevd daemon is started to handle device discovery. If DPU firmware takes longer than the 60-second udevadm settle timeout (BlueField-3 DPUs can take 120 seconds each in the failure case when they are stuck), the initramfs cannot stop this udevd before switch_root. The stale process survives into the real system but is never chrooted into the overlayfs root, leaving it with a broken filesystem view. When dpu-udev-manager.sh writes udev rules, the stale udevd detects the change and crashes on an assertion in systemd's chase() path resolution (assert(path_is_absolute(p)) at chase.c:648), because dir_fd_is_root() returns false for a process whose root still points to the initramfs rootfs rather than the overlayfs.

This triggers a systemd issue : systemd/systemd#29559 which maintainers doesn't consider as a bug from systemd side. Raising this fix for our usecase.

Core was generated by `/usr/lib/systemd/systemd-udevd --daemon --resolve-names=never'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f29fe7a1cc2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f29fe78a4ac in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f29fea50c11 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#4  0x00007f29feb94a8b in chase () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#5  0x00007f29feb956e2 in chase_and_opendir () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#6  0x00007f29feb9a609 in conf_files_list_strv () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#7  0x00007f29fea913e8 in config_get_stats_by_path () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#8  0x0000559f295519cf in ?? ()
#9  0x0000559f29553a77 in ?? ()
#10 0x00007f29fec36055 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#11 0x00007f29fec3668d in sd_event_dispatch () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#12 0x00007f29fec394a8 in sd_event_run () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#13 0x00007f29fec396c7 in sd_event_loop () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
sonic-net#14 0x0000559f29545820 in ?? ()
sonic-net#15 0x00007f29fe78bca8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
sonic-net#16 0x00007f29fe78bd65 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
sonic-net#17 0x0000559f29545c51 in ?? ()

- How I did it
Added a kill_stale_udevd() function to dpu-udev-manager.sh that runs before writing the udev rules. It identifies the systemd-managed udevd PID via systemctl show, then kills any other systemd-udevd --daemon process that doesn't match -- these are leftover initramfs instances. If no stale process exists (e.g. DPUs are healthy and the initramfs udevd exited cleanly), the function is a no-op.

- How to verify it
Deploy the image on a SmartSwitch with DPUs in a state where firmware initialization times out (>60s per DPU) by stopping image installation before firmware install step
Reboot the switch
Verify no new systemd-udevd coredumps in /var/core/
Verify the stale process was killed: journalctl -b 0 | grep dpu-udev-manager should show killing stale initramfs udevd PID (systemd udevd is PID )
Verify systemd-udevd.service is healthy: systemctl status systemd-udevd should show active (running)
Verify DPU udev rules were written: cat /etc/udev/rules.d/92-midplane-intf.rules should contain the DPU interface naming rules

Signed-off-by: Hemanth Kumar Tirupati <tirupatihemanthkumar@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants