Skip to content

fix: ensure systemd service is restarted if nvidia-smi fails#1836

Merged
cdesiniotis merged 1 commit into
NVIDIA:mainfrom
cdesiniotis:systemd-multiple-execstarts
May 19, 2026
Merged

fix: ensure systemd service is restarted if nvidia-smi fails#1836
cdesiniotis merged 1 commit into
NVIDIA:mainfrom
cdesiniotis:systemd-multiple-execstarts

Conversation

@cdesiniotis
Copy link
Copy Markdown
Contributor

We run nvidia-smi in an ExecStart statement instead of an ExecCondition statement so that the unit gets restarted in case of failures. With ExecCondition, if the command returns an error code between 1 and 254 (inclusive) the remaining commands are skipped and the unit is not marked as failed, and thus the service is not restarted.

We run nvidia-smi in an ExecStart statement instead of an ExecCondition statement
so that the unit gets restarted in case of failures. With ExecCondition, if the
command returns an error code between 1 and 254 (inclusive) the remaining commands
are skipped and the unit is not marked as failed, and thus the service is not
restarted.

Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
@cdesiniotis cdesiniotis requested a review from tariq1890 May 19, 2026 17:52
@cdesiniotis cdesiniotis self-assigned this May 19, 2026
@cdesiniotis
Copy link
Copy Markdown
Contributor Author

/cherry-pick release-1.19

@cdesiniotis cdesiniotis merged commit d31519d into NVIDIA:main May 19, 2026
15 checks passed
@github-actions
Copy link
Copy Markdown

🤖 Backport PR created for release-1.19: #1837

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants