Make fresh-agent turn completion server-authoritative#475
Merged
danshapiro merged 8 commits intoJun 23, 2026
Conversation
Fresh-agent panes derived the green/needs-attention highlight and idle chime by differentiating the client-side busy level (useAgentSessionTurnCompletion busy->idle), which produced premature (flicker), missed (fast-turn), and wrong-color chimes. Terminals already use a server-authoritative discrete completion event; fresh-agent panes had none. Each provider adapter now emits a discrete freshAgent.turn.complete edge ONLY on a positive completion: - freshclaude/kilroy: SDK result with subtype === 'success' - freshopencode: the success-only emitStatus(idle) path (not catch/SSE relay) - freshcodex: turn/completed only when params.turn.status === 'completed' (the notification fires on interrupt too; status is carried inline) The client routes it through applyFreshAgentCompletion into the existing GREEN/SOUND pipeline using the at-monotonic dedupe regime (wall-clock at, no per-session counter) so a resumed durable session cannot swallow completions across a server restart and a reconnect cannot re-green. The fragile busy->idle derivation is removed; useAgentSessionTurnCompletion now only handles the waiting-for-approval edge. Verified empirically against real claude/codex binaries. Unit + e2e coverage added. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3
Two blocking correctness issues found by an independent (GPT) review of the server-authoritative fresh-agent turn-completion change: 1. Restored Claude/kilroy completions were dropped. The server keys the completion event by the runtime handle it subscribed with (provider:content.sessionId), but findFreshAgentPaneBySessionKey matched only via resolveFreshAgentSessionKey, which PREFERS the durable Claude UUID in content.sessionRef. On a restored Claude session the runtime nanoid and the durable UUID differ, so the lookup missed and the chime was silently dropped. Now match the runtime handle as well as the resolved key. (OpenCode and Codex keep sessionId === sessionRef.sessionId, so they were unaffected -- which is why the original OpenCode-only test passed.) 2. Raw Date.now() is not a per-turn monotonic identity. Two completions in the same millisecond, or a backward clock step (NTP), would make a real later completion look <= last and be dropped as a replay -- recreating the missed-chime class. Each emit site (sdk-bridge, opencode, codex) now stamps a per-session strictly-monotonic `at` via the shared nextMonotonicTurnCompleteAt helper, preserving restart-safety while guaranteeing distinct turns never collide or regress. Adds unit coverage for the helper, the Claude identity match, and a strictly-increasing `at` at all three emit sites. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3
Three more blocking correctness issues from an independent (GPT) review:
1. OpenCode false-chimed on interrupt. interrupt() aborts the turn, and the
sidecar then emits session.idle, which RESOLVES onceIdle (it does not
reject) -- so the success path fired a turn-complete for an interrupted
turn. interrupt() now sets a per-session turnAborted flag before aborting;
the in-flight send suppresses its chime when the abort-triggered idle
resolves, and each new turn resets the flag.
2. Freshcodex ignored the flat completion shape. The app-server client passes
notification params straight through, and the repo's own client tests model
turn/completed as a flat { threadId, turnId, status } (status at
params.status, not params.turn.status). The adapter now reads
params.turn?.status ?? params.status so both the inline (codex-cli 0.142.0)
and flat shapes are detected and interrupts/failures excluded; the protocol
schema declares both.
3. A waiting-for-approval green could swallow the real server completion. That
edge is stamped with the CLIENT clock but shared the server completion's
provider:sessionId dedupe entry for opencode/codex, so an approval stamped
ahead of the server clock (common on a remote client) suppressed the real
turn-complete via the monotonic at<=last guard. The approval edge now dedupes
under a distinct provider:sessionId#waiting namespace.
Adds unit coverage for all three (opencode interrupt + resume-after-interrupt,
codex flat status, approval-vs-completion dedupe independence).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3
Fresheyes round 3: the codex monotonic `at` clamp lived in the per-subscribe() closure, so it reset on every WS reconnect (subscriptions are torn down and recreated). The client store's dedupe survives reconnects, so a same-ms or backward-clock codex completion right after a reconnect could still be dropped as a replay -- the exact class the clamp was meant to prevent. Claude and OpenCode keep this on session state; codex now keeps it on a per-thread lastTurnCompleteAtByThread map (cleared with the rest of the thread state). Adds coverage that the clamp stays monotonic across a re-subscribe. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3
1. OpenCode interrupt flag was sticky on abort failure. interrupt() set turnAborted before the abort POST; if that POST threw, the turn was NOT actually stopped and could complete normally, yet the stale flag suppressed its chime. interrupt() now clears turnAborted when the abort fails (and rethrows), so a genuine completion still greens/sounds. 2. The wall-clock `at` dedupe was not truly restart-safe. The per-session monotonic clamp can push `at` above real wall time (notably after a backward clock step), and a fresh process then stamps a lower `at` that the client drops as a stale replay. The client now clears the per-terminal `at` baselines on a real server restart (bootId change) via resetCompletionDedupeBaselines -- safe because a fresh process replays nothing and the edge is never re-derived from a snapshot. Plain reconnects keep the baselines, preserving replay dedupe. The plan's overstated "inherently monotonic across restart" claim is corrected accordingly. Adds coverage: opencode chimes after a failed-abort interrupt that then completes; resetCompletionDedupeBaselines clears the at baseline (lower at re-fires) while preserving unacknowledged attention. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3
Removing the client busy->idle derivation left OpenCode /compact -- a user-visible command -- with no way to green/chime on completion (it POSTed /summarize and returned). Claude /compact is a normal turn and already chimes; codex /compact uses the startTurn fallback (compactThread is unimplemented in the real runtime) which also chimes. OpenCode compact now takes the same await-idle + emit path as a send (gated on turnAborted, with the existing onceIdle timeout as a safety net), restoring the prior chime behavior. Also corrects a stale sentence in the plan doc that still claimed wall-clock `at` is inherently monotonic across a restart, contradicting the corrected explanation (the client baseline reset is what closes that gap). Adds coverage that a successful OpenCode compact emits exactly one completion edge. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3
…ent (fresheyes round 6) Round 6 of the independent review PASSED with no blocking findings. Two minor cleanups it flagged: - codex adapter `shutdown()` cleared every other per-thread map but omitted the new `lastTurnCompleteAtByThread`, so a reused-in-process adapter could clamp a fresh completion against a stale pre-shutdown timestamp. Add the missing `.clear()` to match the established cleanup pattern. Covered by a RED→GREEN test: a post-shutdown resubscribe with an earlier wall clock now stamps the fresh time instead of clamping to last+1. (A plain reconnect still keeps the clock, per the existing WS-reconnect contract.) - The `turn-complete-clock` helper comment overstated the clamp as preserving cross-restart monotonicity. It does not — the clamp can push `at` ahead of real wall time, and a fresh process may stamp a lower value; the client baseline reset (`resetCompletionDedupeBaselines`) is what closes that gap. Comment now states this accurately. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3
…rmed completion (fresheyes round 7) Round 7 of the independent review found a major gap round 6 missed. major — OpenCode could chime/green a FAILED turn as a successful completion. The success path gated only on `turnAborted`, but OpenCode surfaces a failed turn out-of-band as a `session.error` SSE event and then goes idle. `onceIdle` resolves on that post-error idle without inspecting the error, so the turn looked "complete". Add a per-session `turnErrored` flag (mirroring `turnAborted`): set in `bindServeStream` when an `sdk.error` is relayed, reset at the start of every send/compact turn, and required-falsy before emitting the completion edge. This makes OpenCode's positive-completion check the true analogue of Claude's `subtype === 'success'` and Codex's `status === 'completed'`. Covered by RED→GREEN tests: error-then-idle does not chime (but still surfaces the error and returns to idle), and the next clean turn chimes again (flag resets per turn). minor — the client `freshAgent.turn.complete` handler fabricated `Date.now()` when `at` was missing/non-numeric. The server always stamps a monotonic `at`, so a fabricated client timestamp could collide with or regress against the server clock and swallow a real later completion. Drop + log malformed events instead. Covered by a RED→GREEN test (missing and NaN `at`). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3
danshapiro
pushed a commit
that referenced
this pull request
Jun 24, 2026
…completion) Merged PR #475's server-authoritative turn-completion pipeline with this branch's transcript auto-render fix. The onTurnCompleted handler now serves both purposes: - Emits an idle sdk.session.snapshot on every turn completion (including interrupts) so the client re-fetches the committed transcript. - Emits a sdk.turn.complete edge only for positive completions (params.turn.status === 'completed') with monotonic-at dedup, for the green/sound pipeline. Took main's version of codex-adapter.test.ts (which includes the full sdk.turn.complete test suite) and re-added the transcript auto-render test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fresh-agent panes (freshclaude, kilroy, freshcodex, freshopencode) previously drove the GREEN / needs-attention highlight and the idle chime by re-deriving the turn-complete edge on the client — watching the Redux busy level and firing on a busy→idle transition. Differentiating a level to recover an edge is inherently fragile and produced premature/flicker chimes, missed chimes, and wrong colors (an interrupt looked just like a completion).
This change gives fresh-agent panes the same server-authoritative discrete completion event that terminal-mode CLIs already use. Each provider adapter emits a
freshAgent.turn.completeedge only on a positive completion, the client folds it into the existing GREEN/SOUND pipeline, and the client-side busy→idle derivation is deleted.Positive-completion predicates (validated against the real binaries)
resultwithsubtype === 'success'.turn/completedonly whenstatus === 'completed'(the notification also fires on interrupt/failure).await idle, gated on per-sessionturnAbortedandturnErrored(OpenCode surfaces a failed turn as an out-of-bandsession.errorthen goes idle, so reaching idle alone is not success).Dedupe & restart safety
at(no per-session counter, which would reset to 0 on restart and swallow completions for a resumed durable session).atstrictly-monotonic per session (nextMonotonicTurnCompleteAt) so same-ms / backward-clock completions never collide or regress within a process.atbaselines on a real server restart (bootId change) to close the residual cross-restart gap.atare dropped + logged, never fabricated withDate.now().What was deleted
useAgentSessionTurnCompletionno longer derives completion from the busy level; it retains only the waiting-for-approval edge, recorded under a distinct dedupe namespace so an approval can't swallow a real server completion.Testing
Review
Hardened across multiple independent (fresh-eyes, GPT-5.5 xhigh adversarial) review rounds; final round returned PASSED with no findings.
Design notes:
docs/superpowers/plans/2026-06-23-fresh-agent-server-authoritative-turn-complete.md.🤖 Generated with Claude Code
https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3