Skip to content

Make fresh-agent turn completion server-authoritative#475

Merged
danshapiro merged 8 commits into
mainfrom
fix/fresh-agent-server-authoritative-completion
Jun 23, 2026
Merged

Make fresh-agent turn completion server-authoritative#475
danshapiro merged 8 commits into
mainfrom
fix/fresh-agent-server-authoritative-completion

Conversation

@danshapiro

Copy link
Copy Markdown
Owner

Summary

Fresh-agent panes (freshclaude, kilroy, freshcodex, freshopencode) previously drove the GREEN / needs-attention highlight and the idle chime by re-deriving the turn-complete edge on the client — watching the Redux busy level and firing on a busy→idle transition. Differentiating a level to recover an edge is inherently fragile and produced premature/flicker chimes, missed chimes, and wrong colors (an interrupt looked just like a completion).

This change gives fresh-agent panes the same server-authoritative discrete completion event that terminal-mode CLIs already use. Each provider adapter emits a freshAgent.turn.complete edge only on a positive completion, the client folds it into the existing GREEN/SOUND pipeline, and the client-side busy→idle derivation is deleted.

Positive-completion predicates (validated against the real binaries)

  • freshclaude / kilroy: SDK result with subtype === 'success'.
  • freshcodex: app-server turn/completed only when status === 'completed' (the notification also fires on interrupt/failure).
  • freshopencode: the success-only idle path after await idle, gated on per-session turnAborted and turnErrored (OpenCode surfaces a failed turn as an out-of-band session.error then goes idle, so reaching idle alone is not success).

Dedupe & restart safety

  • Client dedupes on a wall-clock at (no per-session counter, which would reset to 0 on restart and swallow completions for a resumed durable session).
  • Each server emit site clamps at strictly-monotonic per session (nextMonotonicTurnCompleteAt) so same-ms / backward-clock completions never collide or regress within a process.
  • The client clears its per-terminal at baselines on a real server restart (bootId change) to close the residual cross-restart gap.
  • Malformed completions without a finite numeric at are dropped + logged, never fabricated with Date.now().

What was deleted

useAgentSessionTurnCompletion no longer derives completion from the busy level; it retains only the waiting-for-approval edge, recorded under a distinct dedupe namespace so an approval can't swallow a real server completion.

Testing

  • Coordinated full suite green: client 3839 passed, server 4105 passed (19 skipped), electron 325 passed; typecheck + lint clean.
  • New unit coverage for every emit predicate, the monotonic clamp, identity matching, error suppression, malformed-event drop, and restart baseline reset; plus an e2e for the full WS → chime/highlight → replay-ignore → re-chime flow.

Review

Hardened across multiple independent (fresh-eyes, GPT-5.5 xhigh adversarial) review rounds; final round returned PASSED with no findings.

Design notes: docs/superpowers/plans/2026-06-23-fresh-agent-server-authoritative-turn-complete.md.

🤖 Generated with Claude Code

https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3

codex and others added 8 commits June 23, 2026 01:37
Fresh-agent panes derived the green/needs-attention highlight and idle
chime by differentiating the client-side busy level
(useAgentSessionTurnCompletion busy->idle), which produced premature
(flicker), missed (fast-turn), and wrong-color chimes. Terminals already
use a server-authoritative discrete completion event; fresh-agent panes
had none.

Each provider adapter now emits a discrete freshAgent.turn.complete edge
ONLY on a positive completion:
- freshclaude/kilroy: SDK result with subtype === 'success'
- freshopencode: the success-only emitStatus(idle) path (not catch/SSE relay)
- freshcodex: turn/completed only when params.turn.status === 'completed'
  (the notification fires on interrupt too; status is carried inline)

The client routes it through applyFreshAgentCompletion into the existing
GREEN/SOUND pipeline using the at-monotonic dedupe regime (wall-clock at,
no per-session counter) so a resumed durable session cannot swallow
completions across a server restart and a reconnect cannot re-green. The
fragile busy->idle derivation is removed; useAgentSessionTurnCompletion
now only handles the waiting-for-approval edge.

Verified empirically against real claude/codex binaries. Unit + e2e
coverage added.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3
Two blocking correctness issues found by an independent (GPT) review of the
server-authoritative fresh-agent turn-completion change:

1. Restored Claude/kilroy completions were dropped. The server keys the
   completion event by the runtime handle it subscribed with
   (provider:content.sessionId), but findFreshAgentPaneBySessionKey matched
   only via resolveFreshAgentSessionKey, which PREFERS the durable Claude UUID
   in content.sessionRef. On a restored Claude session the runtime nanoid and
   the durable UUID differ, so the lookup missed and the chime was silently
   dropped. Now match the runtime handle as well as the resolved key. (OpenCode
   and Codex keep sessionId === sessionRef.sessionId, so they were unaffected --
   which is why the original OpenCode-only test passed.)

2. Raw Date.now() is not a per-turn monotonic identity. Two completions in the
   same millisecond, or a backward clock step (NTP), would make a real later
   completion look <= last and be dropped as a replay -- recreating the
   missed-chime class. Each emit site (sdk-bridge, opencode, codex) now stamps a
   per-session strictly-monotonic `at` via the shared nextMonotonicTurnCompleteAt
   helper, preserving restart-safety while guaranteeing distinct turns never
   collide or regress.

Adds unit coverage for the helper, the Claude identity match, and a
strictly-increasing `at` at all three emit sites.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3
Three more blocking correctness issues from an independent (GPT) review:

1. OpenCode false-chimed on interrupt. interrupt() aborts the turn, and the
   sidecar then emits session.idle, which RESOLVES onceIdle (it does not
   reject) -- so the success path fired a turn-complete for an interrupted
   turn. interrupt() now sets a per-session turnAborted flag before aborting;
   the in-flight send suppresses its chime when the abort-triggered idle
   resolves, and each new turn resets the flag.

2. Freshcodex ignored the flat completion shape. The app-server client passes
   notification params straight through, and the repo's own client tests model
   turn/completed as a flat { threadId, turnId, status } (status at
   params.status, not params.turn.status). The adapter now reads
   params.turn?.status ?? params.status so both the inline (codex-cli 0.142.0)
   and flat shapes are detected and interrupts/failures excluded; the protocol
   schema declares both.

3. A waiting-for-approval green could swallow the real server completion. That
   edge is stamped with the CLIENT clock but shared the server completion's
   provider:sessionId dedupe entry for opencode/codex, so an approval stamped
   ahead of the server clock (common on a remote client) suppressed the real
   turn-complete via the monotonic at<=last guard. The approval edge now dedupes
   under a distinct provider:sessionId#waiting namespace.

Adds unit coverage for all three (opencode interrupt + resume-after-interrupt,
codex flat status, approval-vs-completion dedupe independence).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3
Fresheyes round 3: the codex monotonic `at` clamp lived in the per-subscribe()
closure, so it reset on every WS reconnect (subscriptions are torn down and
recreated). The client store's dedupe survives reconnects, so a same-ms or
backward-clock codex completion right after a reconnect could still be dropped
as a replay -- the exact class the clamp was meant to prevent. Claude and
OpenCode keep this on session state; codex now keeps it on a per-thread
lastTurnCompleteAtByThread map (cleared with the rest of the thread state).

Adds coverage that the clamp stays monotonic across a re-subscribe.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3
1. OpenCode interrupt flag was sticky on abort failure. interrupt() set
   turnAborted before the abort POST; if that POST threw, the turn was NOT
   actually stopped and could complete normally, yet the stale flag suppressed
   its chime. interrupt() now clears turnAborted when the abort fails (and
   rethrows), so a genuine completion still greens/sounds.

2. The wall-clock `at` dedupe was not truly restart-safe. The per-session
   monotonic clamp can push `at` above real wall time (notably after a backward
   clock step), and a fresh process then stamps a lower `at` that the client
   drops as a stale replay. The client now clears the per-terminal `at`
   baselines on a real server restart (bootId change) via
   resetCompletionDedupeBaselines -- safe because a fresh process replays
   nothing and the edge is never re-derived from a snapshot. Plain reconnects
   keep the baselines, preserving replay dedupe. The plan's overstated
   "inherently monotonic across restart" claim is corrected accordingly.

Adds coverage: opencode chimes after a failed-abort interrupt that then
completes; resetCompletionDedupeBaselines clears the at baseline (lower at
re-fires) while preserving unacknowledged attention.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3
Removing the client busy->idle derivation left OpenCode /compact -- a
user-visible command -- with no way to green/chime on completion (it POSTed
/summarize and returned). Claude /compact is a normal turn and already chimes;
codex /compact uses the startTurn fallback (compactThread is unimplemented in
the real runtime) which also chimes. OpenCode compact now takes the same
await-idle + emit path as a send (gated on turnAborted, with the existing
onceIdle timeout as a safety net), restoring the prior chime behavior.

Also corrects a stale sentence in the plan doc that still claimed wall-clock
`at` is inherently monotonic across a restart, contradicting the corrected
explanation (the client baseline reset is what closes that gap).

Adds coverage that a successful OpenCode compact emits exactly one completion
edge.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3
…ent (fresheyes round 6)

Round 6 of the independent review PASSED with no blocking findings. Two minor
cleanups it flagged:

- codex adapter `shutdown()` cleared every other per-thread map but omitted the
  new `lastTurnCompleteAtByThread`, so a reused-in-process adapter could clamp a
  fresh completion against a stale pre-shutdown timestamp. Add the missing
  `.clear()` to match the established cleanup pattern. Covered by a RED→GREEN
  test: a post-shutdown resubscribe with an earlier wall clock now stamps the
  fresh time instead of clamping to last+1. (A plain reconnect still keeps the
  clock, per the existing WS-reconnect contract.)
- The `turn-complete-clock` helper comment overstated the clamp as preserving
  cross-restart monotonicity. It does not — the clamp can push `at` ahead of real
  wall time, and a fresh process may stamp a lower value; the client baseline
  reset (`resetCompletionDedupeBaselines`) is what closes that gap. Comment now
  states this accurately.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3
…rmed completion (fresheyes round 7)

Round 7 of the independent review found a major gap round 6 missed.

major — OpenCode could chime/green a FAILED turn as a successful completion. The
success path gated only on `turnAborted`, but OpenCode surfaces a failed turn
out-of-band as a `session.error` SSE event and then goes idle. `onceIdle`
resolves on that post-error idle without inspecting the error, so the turn looked
"complete". Add a per-session `turnErrored` flag (mirroring `turnAborted`): set in
`bindServeStream` when an `sdk.error` is relayed, reset at the start of every
send/compact turn, and required-falsy before emitting the completion edge. This
makes OpenCode's positive-completion check the true analogue of Claude's
`subtype === 'success'` and Codex's `status === 'completed'`. Covered by RED→GREEN
tests: error-then-idle does not chime (but still surfaces the error and returns to
idle), and the next clean turn chimes again (flag resets per turn).

minor — the client `freshAgent.turn.complete` handler fabricated `Date.now()` when
`at` was missing/non-numeric. The server always stamps a monotonic `at`, so a
fabricated client timestamp could collide with or regress against the server clock
and swallow a real later completion. Drop + log malformed events instead. Covered
by a RED→GREEN test (missing and NaN `at`).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01EbrRaaAzqeXP6wnYER4Xw3
@danshapiro danshapiro merged commit 8545697 into main Jun 23, 2026
1 check passed
@danshapiro danshapiro deleted the fix/fresh-agent-server-authoritative-completion branch June 23, 2026 17:44
danshapiro pushed a commit that referenced this pull request Jun 24, 2026
…completion)

Merged PR #475's server-authoritative turn-completion pipeline with this
branch's transcript auto-render fix. The onTurnCompleted handler now
serves both purposes:
- Emits an idle sdk.session.snapshot on every turn completion (including
  interrupts) so the client re-fetches the committed transcript.
- Emits a sdk.turn.complete edge only for positive completions
  (params.turn.status === 'completed') with monotonic-at dedup, for the
  green/sound pipeline.

Took main's version of codex-adapter.test.ts (which includes the full
sdk.turn.complete test suite) and re-added the transcript auto-render test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants