Summary
When the spawner's waitForInit rejects with Session exited before init, the user sees no information about why the session exited. The error is mechanically the same for several distinct failure modes:
Each of these has a different fix, but the user-visible symptom is identical and gives them nothing to act on. In dogfooding we wasted real time chasing the wrong cause.
Where the gap is
packages/smithy/src/runtime/spawner.ts waitForInit:
const onExit = () => {
if (settled) return;
settled = true;
cleanup();
reject(new Error('Session exited before init'));
};
The exit event upstream carries (code, signal) for the interactive path (spawner.ts:1129 calls onExit(code, signal)). For the headless path, processProviderMessages (spawner.ts:1039) emits exit with a synthetic 0|1 derived from resumeErrorDetected rather than the underlying child process exit. So even a naive enrichment that pulls (code, signal) into the message would only help the interactive path well; the headless path needs a separate fix.
Proposed enrichment
Two complementary changes:
- Plumb the real provider-level exit signal into the headless path. The
ClaudeAgentProvider knows when its child process exits and with what code/signal. Forward that to session.events.emit('exit', code, signal) instead of the synthetic 0/1. Today's synthetic value is what getDispatchHealth-adjacent callers depend on, so verify those before changing semantics, but the underlying child exit is what users need.
- Capture the last few stream-json events (or any stderr lines from the provider) in the session. When
waitForInit rejects, include the last event's content if present. Often the actual error text from claude ("Authentication failed", "Rate limit exceeded", etc.) lives in a system or result event that arrives just before the exit.
Resulting message shape, e.g.:
Session exited before init (signal SIGTERM)
Last provider message: result/error_during_execution: \"Authentication failed: token expired\"
vs. today's:
Session exited before init
Why this matters
This was originally going to be Task 2 of #61 / PR #100, but I scoped it out because the headless synthetic-exit-code issue meant a naive fix would print "(exit code 0)" for the exact case the issue was about — useless. A proper fix needs both halves above.
#101 (singleton daemon enforcement) is the most acute current trigger of this missing diagnostic — when two daemons race and one SIGTERMs the other's spawn, the SIGTERM signal is the smoking gun and currently invisible.
Environment
- stoneforge: master @ 0a7052a
- Affects:
packages/smithy/src/runtime/spawner.ts plus the active AgentProvider implementations (ClaudeAgentProvider, possibly CodexAgentProvider).
Happy to send a PR if there is interest.
Summary
When the spawner's
waitForInitrejects withSession exited before init, the user sees no information about why the session exited. The error is mechanically the same for several distinct failure modes:claudePath, missing env)Each of these has a different fix, but the user-visible symptom is identical and gives them nothing to act on. In dogfooding we wasted real time chasing the wrong cause.
Where the gap is
packages/smithy/src/runtime/spawner.tswaitForInit:The
exitevent upstream carries(code, signal)for the interactive path (spawner.ts:1129callsonExit(code, signal)). For the headless path,processProviderMessages(spawner.ts:1039) emitsexitwith a synthetic0|1derived fromresumeErrorDetectedrather than the underlying child process exit. So even a naive enrichment that pulls(code, signal)into the message would only help the interactive path well; the headless path needs a separate fix.Proposed enrichment
Two complementary changes:
ClaudeAgentProviderknows when its child process exits and with what code/signal. Forward that tosession.events.emit('exit', code, signal)instead of the synthetic 0/1. Today's synthetic value is whatgetDispatchHealth-adjacent callers depend on, so verify those before changing semantics, but the underlying child exit is what users need.waitForInitrejects, include the last event's content if present. Often the actual error text fromclaude("Authentication failed", "Rate limit exceeded", etc.) lives in asystemorresultevent that arrives just before the exit.Resulting message shape, e.g.:
vs. today's:
Why this matters
This was originally going to be Task 2 of #61 / PR #100, but I scoped it out because the headless synthetic-exit-code issue meant a naive fix would print "(exit code 0)" for the exact case the issue was about — useless. A proper fix needs both halves above.
#101 (singleton daemon enforcement) is the most acute current trigger of this missing diagnostic — when two daemons race and one SIGTERMs the other's spawn, the SIGTERM signal is the smoking gun and currently invisible.
Environment
packages/smithy/src/runtime/spawner.tsplus the activeAgentProviderimplementations (ClaudeAgentProvider, possiblyCodexAgentProvider).Happy to send a PR if there is interest.