fix(cli): kill delegated server child on uncatchable dispatcher death…#3378
fix(cli): kill delegated server child on uncatchable dispatcher death…#3378wuisabel-gif wants to merge 1 commit into
Conversation
…Hmbown#3259) Completes the Hmbown#3259 follow-up the maintainer flagged: the Tokio supervisor added earlier handles catchable shutdown (Ctrl+C/SIGTERM/SIGHUP), but an uncatchable dispatcher death (SIGKILL or a hard crash) can't run that path and would orphan the delegated `serve`/`app-server` listener. Add two OS-level safety nets to the server-delegation path, mirroring the idioms already in crates/tui/src/tools/shell.rs: - Linux: set `PR_SET_PDEATHSIG` (SIGTERM) on the child via `pre_exec`, so the kernel signals it when the dispatcher dies for any reason. - Windows: place the child in a kill-on-job-close Job Object; the OS closes the dispatcher's job handle on process death and terminates the child. macOS has no equivalent parent-death primitive, so an uncatchable dispatcher death there can still orphan the child; documented inline as the residual gap. Adds target-gated `libc` (Linux) and `windows` (Windows, Job Object features) deps to the cli crate. Verified: native build/clippy/tests on macOS, plus isolated cross-compile checks of the new FFI for x86_64-unknown-linux-gnu and x86_64-pc-windows-gnu. Refs Hmbown#3259.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Thanks @wuisabel-gif for taking the time to contribute. This repository is observing a maintainer-managed PR intake gate in dry-run mode, so this pull request is staying open. This note helps maintainers prepare the allowlist before any enforcement is considered. Please read |
There was a problem hiding this comment.
Code Review
This pull request implements parent-death cleanup for delegated server children on Linux and Windows to ensure they are terminated if the dispatcher process dies uncatchably. On Linux, this is achieved using PR_SET_PDEATHSIG via libc::prctl in a pre_exec closure, while on Windows, it utilizes a kill-on-job-close Job Object. A review comment points out that returning an error from the pre_exec closure when prctl fails will abort the process spawning, making the failure fatal instead of non-fatal. It is recommended to ignore the error so that the child process can still spawn in environments where prctl might be restricted.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| cmd.pre_exec(|| { | ||
| if libc::prctl(libc::PR_SET_PDEATHSIG, libc::SIGTERM, 0, 0, 0) == -1 { | ||
| // Non-fatal: the child only loses the parent-death safety net. | ||
| Err(std::io::Error::last_os_error()) | ||
| } else { | ||
| Ok(()) | ||
| } | ||
| }); |
There was a problem hiding this comment.
The comment states that the failure of prctl is non-fatal, but returning Err(std::io::Error::last_os_error()) from the pre_exec closure actually aborts the process spawning, making it fatal. In environments with strict seccomp filters (like some Docker/Kubernetes setups) where prctl might be restricted, this will prevent the server from starting entirely. To make it truly non-fatal, the error should be ignored and the closure should always return Ok(()).
cmd.pre_exec(|| {
// Non-fatal: ignore errors so the child still spawns even if prctl is restricted (e.g. in some sandboxes/containers).
let _ = libc::prctl(libc::PR_SET_PDEATHSIG, libc::SIGTERM, 0, 0, 0);
Ok(())
});|
Thanks @wuisabel-gif — I carried this into the v0.8.64 integration branch with attribution:
While resolving the release-branch conflict I kept the existing Linux helper name and folded in the review hardening so a Verified with:
Appreciate the fast follow-through on the delegated-server cleanup. |
Follow up on the delegated server teardown harvest from PR #3378 by @wuisabel-gif: Tokio child handles expose raw_handle() on Windows, returning None if the child has already exited before the job object can be attached.
|
Thank you @wuisabel-gif — this fix landed in the v0.8.64 release branch (install_server_parent_death_signal + attach_server_child_job in crates/cli/src/lib.rs). Your PR shaped the implementation; tracked via #3259 (now closed as fixed). Appreciate the contribution! |
… (#3259)
Completes the #3259 follow-up the maintainer flagged: the Tokio supervisor added earlier handles catchable shutdown (Ctrl+C/SIGTERM/SIGHUP), but an uncatchable dispatcher death (SIGKILL or a hard crash) can't run that path and would orphan the delegated
serve/app-serverlistener.Add two OS-level safety nets to the server-delegation path, mirroring the idioms already in crates/tui/src/tools/shell.rs:
PR_SET_PDEATHSIG(SIGTERM) on the child viapre_exec, so the kernel signals it when the dispatcher dies for any reason.macOS has no equivalent parent-death primitive, so an uncatchable dispatcher death there can still orphan the child; documented inline as the residual gap.
Adds target-gated
libc(Linux) andwindows(Windows, Job Object features) deps to the cli crate. Verified: native build/clippy/tests on macOS, plus isolated cross-compile checks of the new FFI for x86_64-unknown-linux-gnu and x86_64-pc-windows-gnu.Refs #3259.
Summary
Testing
cargo fmt --all -- --checkcargo clippy --workspace --all-targets --all-featurescargo test --workspace --all-featuresChecklist