Skip to content

fix(desktop): surface backend startup failures#932

Merged
wesm merged 8 commits into
mainfrom
desktop-startup-failures
Jun 30, 2026
Merged

fix(desktop): surface backend startup failures#932
wesm merged 8 commits into
mainfrom
desktop-startup-failures

Conversation

@wesm

@wesm wesm commented Jun 30, 2026

Copy link
Copy Markdown
Member

Desktop startup now renders reportable failure details in the Tauri loading window when the bundled backend fails to launch, errors, or exits before readiness. It also keeps waiting through long daemon migrations and full resyncs instead of treating a five-minute startup as failed, while still reporting clearly absent or incompatible daemons.

@roborev-ci

roborev-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

roborev: Combined Review (22bb7d0)

Backend startup handling has one medium-risk issue that should be addressed before merge.

Medium

  • desktop/src-tauri/src/lib.rs:1595 - classify_backend_status_output treats every non-empty, non-ready status as Starting, including serve status output like agentsview process running ... but not responding to health checks. That means an unhealthy daemon can be shown indefinitely as “still preparing,” especially now that the fixed startup deadline was removed. Classify the “process running ... not responding to health checks” state separately and surface it as a startup failure after a short grace period or immediately, while keeping true start-lock states such as agentsview is starting up. open-ended.

Panel: ci_default_security | Synthesis: codex, 11s | Members: codex_default (codex/default, done, 4m22s), codex_security (codex/security, done, 1m59s) | Total: 6m32s

@roborev-ci

roborev-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

roborev: Combined Review (f226036)

Summary verdict: one medium issue needs attention; no high or critical findings were reported.

Medium

  • desktop/src-tauri/src/lib.rs:1623 - Any non-empty, unrecognized serve status output is now classified as Starting, and the background poll loop no longer has an overall timeout. States like a read-only daemon status or a fatal status-probe error on stderr can keep the desktop UI polling forever instead of surfacing a startup failure.

    Fix: Return Starting only for explicit startup/progress states, and classify read-only, nonzero, or unknown status output as a renderable failure, or keep a bounded timeout for unknown non-ready states.


Panel: ci_default_security | Synthesis: codex, 12s | Members: codex_default (codex/default, done, 4m13s), codex_security (codex/security, done, 56s) | Total: 5m21s

@roborev-ci

roborev-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

roborev: Combined Review (93794eb)

The PR has one actionable medium-severity issue; no high or critical findings were reported.

Medium

  • desktop/src-tauri/src/lib.rs:1623 - Non-ready daemon status can cause an infinite wait. Any non-empty status output that is not ready, not-running, incompatible, or unhealthy is treated as Starting. Read-only daemon status is deliberately rejected by parse_writable_listening_port_from_status, then falls through here; because the new poll loop has no startup timeout, the desktop can wait forever when no writable backend becomes ready.
    • Suggested fix: classify read-only and other non-startup status output as an unusable/failure state, or keep a bounded timeout for non-ready states. Add coverage for read-only serve status output.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 4m54s), codex_security (codex/security, done, 3m1s) | Total: 8m2s

@roborev-ci

roborev-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

roborev: Combined Review (2d1e499)

Summary verdict: Medium-risk regressions remain in desktop packaging and backend startup handling.

Medium

  • desktop/scripts/run-tauri.sh:59
    The wrapper now injects --target <host> into every build without an explicit target, causing Tauri artifacts to land under src-tauri/target/<triple>/release/.... Existing Makefile packaging targets still look under src-tauri/target/release/..., so native desktop make builds may finish Tauri successfully and then fail to find the bundle.
    Fix: Only pass --target for explicitly requested or cross targets, or update artifact lookup paths to use the resolved target triple.

  • desktop/src-tauri/src/lib.rs:1511
    Treating ReadOnly status as terminal can abort valid desktop startup when a read-only daemon is already running. serve --background can start a writable daemon while serve status still reports the existing read-only runtime before the writable child becomes ready, so slow writable startup is shown as a fatal read-only backend error.
    Fix: Make serve status prefer an active writable startup lock before reporting a read-only daemon, or keep polling from the desktop until startup is known to be finished.

  • desktop/src-tauri/src/lib.rs:1537
    Repeated Unavailable status probes have no terminal path after the startup deadline was removed. If agentsview serve status cannot spawn, exits without useful output, or times out repeatedly, the UI polls forever with a waiting message instead of surfacing a startup failure.
    Fix: Keep a bounded failure threshold for Unavailable probes while allowing the explicit Starting state to remain open-ended.


Panel: ci_default_security | Synthesis: codex, 11s | Members: codex_default (codex/default, done, 4m58s), codex_security (codex/security, done, 3m3s) | Total: 8m12s

Desktop startup needs to keep long-running migrations open-ended when a real start lock is present, but it should not wait forever on status probes that cannot prove a writable backend is coming. This keeps read-only daemon reports behind active startup locks, caps repeated unavailable status checks with a reportable crash window, and preserves the default Tauri artifact layout except where Windows ARM64 needs an explicit target to match the sidecar.
@roborev-ci

roborev-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

roborev: Combined Review (00c07a4)

PR has one medium-risk startup polling issue; no high or critical findings.

Medium

  • desktop/src-tauri/src/lib.rs:1461 - Starting status resets failed_status_probes to 0, while the sleep interval is based only on that counter. During a legitimate long migration or initial sync, this can poll every 125ms indefinitely, spawning a new agentsview serve status sidecar each time and potentially creating thousands of extra processes while the backend is already under startup load.
    • Suggested fix: keep the consecutive-unavailable failure count separate from polling backoff, or back off Starting/Unhealthy probes to the max interval after the first few attempts.

Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 4m13s), codex_security (codex/security, done, 2m24s) | Total: 6m43s

Long archive migrations intentionally keep the daemon startup state open-ended, but polling serve status every 125ms for that entire period adds avoidable sidecar process churn while the backend is busy. Track poll backoff separately from consecutive unavailable failures so Starting and Unhealthy states back off to the normal maximum interval without losing the bounded failure path for unusable status probes.
@roborev-ci

roborev-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

roborev: Combined Review (95468e2)

Medium issue found; no Critical or High findings.

Medium

  • desktop/src-tauri/src/lib.rs:1687 - Startup status classification can falsely mark a valid starting daemon as Unusable. status_output_is_starting checks the entire combined stdout/stderr output for exact equality with agentsview is starting up., so non-fatal stderr diagnostics such as config warnings can override the startup-lock state.
    • Fix: classify the starting state from stdout alone, or make the check line-based so extra stderr output does not break startup detection.

Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 5m36s), codex_security (codex/security, done, 2m14s) | Total: 7m58s

serve status can emit benign stderr diagnostics while stdout still reports that the daemon startup lock is active. Treat the startup marker as a status line instead of requiring the entire combined probe output to match exactly, so long migrations remain in the open-ended startup path instead of becoming an unusable backend failure.
@roborev-ci

roborev-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

roborev: Combined Review (c396c69)

Summary verdict: no Medium, High, or Critical findings to report.

The only reported issue was Low severity, so it is omitted per the review-combination rules.


Panel: ci_default_security | Synthesis: codex, 4s | Members: codex_default (codex/default, done, 4m8s), codex_security (codex/security, done, 1m48s) | Total: 6m0s

@wesm wesm merged commit 4f240b3 into main Jun 30, 2026
33 of 34 checks passed
@wesm wesm deleted the desktop-startup-failures branch June 30, 2026 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants