Skip to content

fix(cli): kill delegated server child on uncatchable dispatcher death…#3378

Open
wuisabel-gif wants to merge 1 commit into
Hmbown:mainfrom
wuisabel-gif:fix/3259-uncatchable-dispatcher-death
Open

fix(cli): kill delegated server child on uncatchable dispatcher death…#3378
wuisabel-gif wants to merge 1 commit into
Hmbown:mainfrom
wuisabel-gif:fix/3259-uncatchable-dispatcher-death

Conversation

@wuisabel-gif

Copy link
Copy Markdown
Contributor

… (#3259)

Completes the #3259 follow-up the maintainer flagged: the Tokio supervisor added earlier handles catchable shutdown (Ctrl+C/SIGTERM/SIGHUP), but an uncatchable dispatcher death (SIGKILL or a hard crash) can't run that path and would orphan the delegated serve/app-server listener.

Add two OS-level safety nets to the server-delegation path, mirroring the idioms already in crates/tui/src/tools/shell.rs:

  • Linux: set PR_SET_PDEATHSIG (SIGTERM) on the child via pre_exec, so the kernel signals it when the dispatcher dies for any reason.
  • Windows: place the child in a kill-on-job-close Job Object; the OS closes the dispatcher's job handle on process death and terminates the child.

macOS has no equivalent parent-death primitive, so an uncatchable dispatcher death there can still orphan the child; documented inline as the residual gap.

Adds target-gated libc (Linux) and windows (Windows, Job Object features) deps to the cli crate. Verified: native build/clippy/tests on macOS, plus isolated cross-compile checks of the new FFI for x86_64-unknown-linux-gnu and x86_64-pc-windows-gnu.

Refs #3259.

Summary

Testing

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features
  • cargo test --workspace --all-features

Checklist

  • Updated docs or comments as needed
  • Added or updated tests where relevant
  • Verified TUI behavior manually if UI changes
  • Harvested/co-authored credit uses a GitHub numeric noreply address

…Hmbown#3259)

Completes the Hmbown#3259 follow-up the maintainer flagged: the Tokio supervisor
added earlier handles catchable shutdown (Ctrl+C/SIGTERM/SIGHUP), but an
uncatchable dispatcher death (SIGKILL or a hard crash) can't run that path and
would orphan the delegated `serve`/`app-server` listener.

Add two OS-level safety nets to the server-delegation path, mirroring the
idioms already in crates/tui/src/tools/shell.rs:

- Linux: set `PR_SET_PDEATHSIG` (SIGTERM) on the child via `pre_exec`, so the
  kernel signals it when the dispatcher dies for any reason.
- Windows: place the child in a kill-on-job-close Job Object; the OS closes the
  dispatcher's job handle on process death and terminates the child.

macOS has no equivalent parent-death primitive, so an uncatchable dispatcher
death there can still orphan the child; documented inline as the residual gap.

Adds target-gated `libc` (Linux) and `windows` (Windows, Job Object features)
deps to the cli crate. Verified: native build/clippy/tests on macOS, plus
isolated cross-compile checks of the new FFI for x86_64-unknown-linux-gnu and
x86_64-pc-windows-gnu.

Refs Hmbown#3259.
@wuisabel-gif wuisabel-gif requested a review from Hmbown as a code owner June 22, 2026 04:32

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@github-actions

Copy link
Copy Markdown

Thanks @wuisabel-gif for taking the time to contribute.

This repository is observing a maintainer-managed PR intake gate in dry-run mode, so this pull request is staying open. This note helps maintainers prepare the allowlist before any enforcement is considered.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant recurring PR access by commenting /lgtm on a pull request.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements parent-death cleanup for delegated server children on Linux and Windows to ensure they are terminated if the dispatcher process dies uncatchably. On Linux, this is achieved using PR_SET_PDEATHSIG via libc::prctl in a pre_exec closure, while on Windows, it utilizes a kill-on-job-close Job Object. A review comment points out that returning an error from the pre_exec closure when prctl fails will abort the process spawning, making the failure fatal instead of non-fatal. It is recommended to ignore the error so that the child process can still spawn in environments where prctl might be restricted.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread crates/cli/src/lib.rs
Comment on lines +1896 to +1903
cmd.pre_exec(|| {
if libc::prctl(libc::PR_SET_PDEATHSIG, libc::SIGTERM, 0, 0, 0) == -1 {
// Non-fatal: the child only loses the parent-death safety net.
Err(std::io::Error::last_os_error())
} else {
Ok(())
}
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The comment states that the failure of prctl is non-fatal, but returning Err(std::io::Error::last_os_error()) from the pre_exec closure actually aborts the process spawning, making it fatal. In environments with strict seccomp filters (like some Docker/Kubernetes setups) where prctl might be restricted, this will prevent the server from starting entirely. To make it truly non-fatal, the error should be ignored and the closure should always return Ok(()).

        cmd.pre_exec(|| {
            // Non-fatal: ignore errors so the child still spawns even if prctl is restricted (e.g. in some sandboxes/containers).
            let _ = libc::prctl(libc::PR_SET_PDEATHSIG, libc::SIGTERM, 0, 0, 0);
            Ok(())
        });

@Hmbown

Hmbown commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Thanks @wuisabel-gif — I carried this into the v0.8.64 integration branch with attribution:

  • c9e887e fix(cli): kill delegated server child on uncatchable dispatcher death (#3259)

While resolving the release-branch conflict I kept the existing Linux helper name and folded in the review hardening so a prctl failure is best-effort instead of aborting the delegated child spawn. The Windows Job Object guard is included.

Verified with:

  • cargo fmt --all
  • cargo test -p codewhale-cli --locked server_teardown_tests
  • RUSTFLAGS=-Dwarnings cargo check -p codewhale-cli --locked
  • git diff --check HEAD~2..HEAD
  • python3 scripts/check-coauthor-trailers.py --author-map .github/AUTHOR_MAP --range HEAD~2..HEAD --check-authors

Appreciate the fast follow-through on the delegated-server cleanup.

Hmbown added a commit that referenced this pull request Jun 22, 2026
Follow up on the delegated server teardown harvest from PR #3378 by @wuisabel-gif: Tokio child handles expose raw_handle() on Windows, returning None if the child has already exited before the job object can be attached.
@Hmbown

Hmbown commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Thank you @wuisabel-gif — this fix landed in the v0.8.64 release branch (install_server_parent_death_signal + attach_server_child_job in crates/cli/src/lib.rs). Your PR shaped the implementation; tracked via #3259 (now closed as fixed). Appreciate the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants