Skip to content

feat(runner): one-shot retry on opencode null-with-no-errorKind + retry chip visibility#85

Closed
chorus-codes wants to merge 1 commit into
mainfrom
feat/retry-visibility-and-opencode-null
Closed

feat(runner): one-shot retry on opencode null-with-no-errorKind + retry chip visibility#85
chorus-codes wants to merge 1 commit into
mainfrom
feat/retry-visibility-and-opencode-null

Conversation

@chorus-codes
Copy link
Copy Markdown
Owner

Summary

Victor caught this on the PR #83 audit screenshot. The opencode-go qwen3.6-plus reviewer exited with an empty answer (no `errorKind` classified) and went straight to claude-sonnet-4-6 fallback — the PR #79 retry never fired because `isRetryableErrorKind(undefined)` is hard-false.

Fix

Extended classifier signature to `isRetryableErrorKind(kind, lineage?)`. The new rule: `kind === undefined && lineage === 'opencode'` → retry once. Other lineages unchanged.

Why opencode specifically:

  • opencode-go's gateway has known transport flakes where the subprocess exits 0 with empty stdout but a second attempt succeeds with the same prompt
  • codex/claude/gemini null-with-no-kind almost always means the model genuinely produced nothing — retry would produce the same nothing
  • An explicit non-retryable kind (auth/quota/db-corrupt/no_output) still wins; the lineage hint never overrides

Retry visibility (no UI work needed)

The `transient_retry` cli_warning already renders as an amber chip on the participant card via the existing `participant.warnings` block in `participant-card.tsx`. With this PR, opencode failures that previously went silent-to-fallback now fire the warning, so users see "TRANSIENT_RETRY — Transient failure on opencode/opencode-go/qwen3.6-plus — retrying once before advancing fallback." before the swap banner appears.

Tests

974 → 975 passing. New tests cover:

  • `undefined kind` + `opencode` lineage → retryable
  • `undefined kind` + any other lineage → not retryable (conservative default preserved)
  • Explicit terminal kinds (`quota_exhausted`, `opencode_db_corrupt`, `no_output`) stay terminal even on opencode

Test plan

  • `pnpm typecheck` clean
  • `pnpm test` 975/975 passing
  • Chorus self-audit

Victor caught this on the PR #83 audit: qwen3.6-plus reviewer ran on
opencode-go, exited cleanly with empty output (no errorKind, no
message), and went straight to fallback chain advance — no retry
attempt. The PR #79 retry classifier returned false because
isRetryableErrorKind(undefined) was hard-false. That's the right
default for codex/claude/gemini (a null with no kind usually means
the model genuinely produced nothing — retry would produce nothing
again), but opencode-go's gateway has known transport flakes where
a second attempt succeeds with the same prompt.

Fix: extend isRetryableErrorKind to accept an optional `lineage` hint.
When `kind` is undefined AND `lineage === 'opencode'`, treat as
retryable. Other lineages keep the conservative default. The lineage
hint does NOT override an explicit non-retryable kind — auth/quota/
db-corrupt are still terminal regardless of lineage.

Both reviewer-driver and doer-driver call sites now pass
`entry.lineage` so the chain step picks up the new behaviour.

Retry visibility: no UI work needed — the existing `transient_retry`
cli_warning already renders as an amber chip on the participant card
via participant-card.tsx's `participant.warnings` block. The chip
appears the moment retry fires; the message reads "Transient X
failure on Y/Z — retrying once before advancing fallback."

Tests: 974 -> 975 passing (+5 new cases on the lineage hint behavior).
@chorus-codes
Copy link
Copy Markdown
Owner Author

Replaced by #87 (rebased onto post-#83 main with conflict resolved + lineage hint applied to the new tryClaim-wrapped retry loop).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant