feat(foreman): coder-tier escalation, re-dispatch a failed coder to a larger model#964
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
joryirving
left a comment
There was a problem hiding this comment.
Reviewing as the operator running the outer-layer equivalent of this in production (our dispatch-bridge re-lanes attempt-exhausted issues to a frontier coder), so I checked the trigger design against our live fleet's task envelopes rather than just the diff.
Field validation of the truth table — it holds on real CRs:
wl-misospace-dispatch-525-code-525: verdictINCOMPLETE, top-levelextra.outcome: MODEL-DECIDED, nestedmodelExtra.outcome: STUCK-LOOP-DETECTED. This is a live instance of exactly the trap the PR calls out — a naive top-level read escalates it; your discriminator correctly refuses.wl-misospace-dispatch-528-code-528:NO-GOwith topNO-CHANGES— correctly excluded.wl-misospace-kubetix-152-code-152and...-action-365-code-365:NO-GO/MODEL-DECIDED— correctly escalate. ThecoderResultEnvelopeshape matches productionstatus.resultverbatim.
Also confirmed no interaction hazard with #959: its iteration fires on a review NO-GO (coder GO'd), yours on a coder-terminal NO-GO (reviews never ran) — mutually exclusive by construction, and the code-<n>-r<k> step names can't collide with your exact-match code-<n> lookup.
Blocking question — who reviews the -esc branch? The tier emits code-<N>-esc + verify-<N>-esc only. On a Workload with reviewers (our fleet's default shape), the escalated coder can GO and gate-pass, but no review-<N>-esc exists — so no reviewer verdict, and post-#956 no openPullRequest carrier either. The escalated fix ends as a pushed, unreviewed branch with no PR, on the path where the operator has explicitly declared they want reviews. Related: what does rollup make of that issue — the base review-<N> tasks presumably cascade-failed when code-<N> failed, so does the Workload still roll up Failed even after a successful escalation? If skipping reviews on the esc tier is intentional scoping, a doc note + follow-up issue works for me and I'll approve; if not, mirroring the base review steps (with the #937 stamp) onto the esc branch looks mechanical.
Non-blocking, from the same field data: 2 of our 3 real NO-GO/MODEL-DECIDED cases were "this issue is already resolved by " honest bails. Under this design both burn an escalation run for the larger model to re-confirm staleness. Not this PR's problem — but it suggests a future dedicated outcome (e.g. ALREADY-RESOLVED) so the honest-bail class can split into "couldn't do it" (escalate) vs "nothing to do" (close/flag). Happy to file it upstream with the CR evidence if useful.
We'll also adopt this taxonomy in our bridge's outer tier — attempt-counting escalated our PUSH-FAILED/harness failures to the big model, which your trigger correctly never would. The two layers compose nicely: in-Workload capability escalation first, lane-level re-dispatch as the outer loop.
Groundwork for the coder escalation tier: an opt-in larger-model coder that re-attempts an issue when the base coder fails, and a prompt-prefix field to carry the prior model's diagnosis alongside the issue body. Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>
Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>
Escalate only capability failures (model NO-GO or CODER-GATE-FAILED), never a model-decided INCOMPLETE / stuck-loop / ERROR. Reads the typed verdict plus the nested modelExtra.outcome, since the top-level extra.outcome is MODEL-DECIDED for all model-terminated runs alike. Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>
… failure Second-pass hook mirroring emitEscalations, wired before reviewer escalation. Emits code-<N>-esc/verify-<N>-esc at EscalationCoderAgentRef with the prior model's diagnosis as a prompt prefix; bounded to one tier, idempotent, MaxTasks- and sovereignty-gated. Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>
The topOutcome param is kept configurable to mirror the executor envelope shape even though current cases all pass MODEL-DECIDED, matching the existing pressure_test.go convention. Clears the make lint gate. Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>
Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>
A base coder capability-failure re-dispatched to the escalation tier emitted only code-<N>-esc and verify-<N>-esc, so the escalated coder could GO and gate-pass but nothing produced a reviewer verdict or the openPullRequest carrier: the fix ended as a pushed, unreviewed branch with no PR. coderEscalationSteps now also appends one review-<N>-esc-<i> per ReviewerAgentRef, dependent on verify-<N>-esc, carrying the esc branch and mirroring the base review step's openPullRequest computation. A Workload with no reviewers is unchanged (code+verify only). Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>
…ists When a base code-<N> capability-failure was escalated, the base verify-<N>/review-<N>-<i> that cascade-failed stayed in the rollup slice, so a SUCCESSFUL escalation still rolled the Workload up Failed: the issue was judged by the dead base attempt, not the esc attempt. activeChildren now drops the base-attempt synthesized steps for an issue once a code-<N>-esc task exists, alongside the existing fix-iteration supersession. The esc steps themselves never parse as base issue steps, so the escalation attempt is never dropped by its own rule; unescalated issues are untouched. emitCoderEscalations labels its in-flight placeholders so the supersession applies in the same reconcile pass, matching emitReviewIterations. The envtest drives a base NO-GO plus its cascade-failed verify/review through a full escalation to on-target success and asserts the Workload reaches Completed rather than Failed. Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>
…on scan The coder-escalation tier emits review-<N>-esc-<i> steps, which carry the review-<N>- prefix that the reviewer-escalation tier (defilantech#546) scans as base reviews. When both tiers are enabled on a Workload, a NO-GO on an escalated review would fan escalate-<N>-<j> steps against the base branch instead of the escalated one. Exclude review-<N>-esc-<i> from the base-review scan so the two tiers compose. Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>
b8bc040 to
b5a9c82
Compare
|
Thanks for the field-validation against live CRs, that is exactly the confidence I wanted on the trigger, and the #959 mutual-exclusivity check is reassuring. Addressed your blocking point (full mirror, not scoped out) plus one interaction I found along the way. Rebased onto current main (your #959/#967/#956 all landed). Reviewers on the escalated branch. Rollup. You were right to ask. I verified the semantics rather than assume: a base Cross-tier guard (found while wiring the above). All green: unit + envtest, On the non-blocking Ready for another look. |
joryirving
left a comment
There was a problem hiding this comment.
Approve. Traced the three changes against the parsers rather than the descriptions; all correct.
Reviewer fan-out. review-<N>-esc-<i> per ref, DependsOn: verify-<N>-esc, openPullRequest computed identically to the base — resolves the unreviewed-branch blocker. Code+verify-only when no reviewers, unchanged.
Supersession. Verified the matcher does exactly what's needed and nothing more: issueStepIteration matches every base shape (code-N/verify-N via the step==base arm, review-N-<i> via the digit-checked review parser) so escalated[n] drops the base coder's NO-GO and its cascade-failed verify/review — but rejects all three -esc shapes (verify-N-esc misses the -r prefix; review-N-esc-<i> fails reviewIterationOf's digit loop on the e of esc), so the escalation attempt's own verify/review stay in the rollup. The same-pass labeled placeholders keep the Workload in-flight while esc runs, so no premature Completed. Confirmed the placeholder labelStep (TrimPrefix(name, w.Name+"-")) equals the real task's step.Name under absoluteTaskName.
Cross-tier guard. Switch order puts review-N-esc- before review-N-, so esc reviews are skipped by the base reviewer-escalation scan while review-N-<i> still counts. Correct.
One non-blocking edge (inherited, not introduced here). coderEscalationSteps gates the whole issue on existingEsc[code-<N>-esc], so if a reconcile creates code-<N>-esc but a transient API error aborts before review-<N>-esc-<i> (renderAndCreate creates in order, returns on the first non-AlreadyExists error), the next pass sees code-<N>-esc exists → continue → the missing review-esc is never re-proposed, and the branch lands unreviewed again — the exact class this commit closes. Low probability, and it's the same issue-level dedup escalationSteps (#546) uses. Since renderAndCreate already skips AlreadyExists, keying the skip per-step (or emitting all esc steps every pass and letting the create dedup) would make it self-healing. Fine as a follow-up.
Filing the ALREADY-RESOLVED outcome issue upstream with the CR evidence as agreed.
…27 ➔ 0.8.28) (#1405) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/defilantech/charts/llmkube](https://github.com/defilantech/LLMKube) | patch | `0.8.27` → `0.8.28` | --- ### Release Notes <details> <summary>defilantech/LLMKube (ghcr.io/defilantech/charts/llmkube)</summary> ### [`v0.8.28`](https://github.com/defilantech/LLMKube/blob/HEAD/CHANGELOG.md#0828-2026-07-04) [Compare Source](defilantech/LLMKube@v0.8.27...v0.8.28) ##### Features - **foreman:** bounded fix iteration on reviewer NO-GO instead of terminal failure ([#​959](defilantech/LLMKube#959)) ([d820fff](defilantech/LLMKube@d820fff)) - **foreman:** coder-tier escalation, re-dispatch a failed coder to a larger model ([#​964](defilantech/LLMKube#964)) ([ce8f655](defilantech/LLMKube@ce8f655)) - **foreman:** executor-owned revise-from-branch restore for revision tasks ([#​967](defilantech/LLMKube#967)) ([b76051c](defilantech/LLMKube@b76051c)) - **foreman:** open the pull request on review GO ([#​956](defilantech/LLMKube#956)) ([fd852e1](defilantech/LLMKube@fd852e1)) - **inference:** add spec.modelCache.claimName for user-owned cache PVCs ([#​960](defilantech/LLMKube#960)) ([aab5a58](defilantech/LLMKube@aab5a58)) ##### Bug Fixes - **foreman:** accept workspace-internal absolute paths in resolveInside ([#​957](defilantech/LLMKube#957)) ([34b126c](defilantech/LLMKube@34b126c)) - **foreman:** defer generic self-gate when runtime is missing from coder image ([#​958](defilantech/LLMKube#958)) ([df185ec](defilantech/LLMKube@df185ec)) - **foreman:** reject no-op str\_replace where old\_string equals new\_string ([#​969](defilantech/LLMKube#969)) ([c71f38b](defilantech/LLMKube@c71f38b)) - **foreman:** scope-overlap check catches Go files in new directories ([#​962](defilantech/LLMKube#962)) ([486a944](defilantech/LLMKube@486a944)) - **inference:** warn when modelCache.claimName is silently ignored ([#​966](defilantech/LLMKube#966)) ([d49cd22](defilantech/LLMKube@d49cd22)) ##### Documentation - register the Karpenter GPU autoscaling guide in nav.yaml ([#​954](defilantech/LLMKube#954)) ([88b9c7d](defilantech/LLMKube@88b9c7d)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19--> Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/1405
What
Adds an opt-in coder escalation tier to Foreman's Workload controller. When a base coder task fails at its model's ceiling, the WorkloadReconciler re-dispatches that issue once to a larger coder model, carrying the failed model's own diagnosis as a prompt hint. Off by default; enabled per-Workload via a new
escalationCoderAgentRef. This is the coder-side twin of the existing escalation-reviewer tier (#546).Why
Fixes #963
A single coder model is a fixed capability ceiling. In a four-issue batch on one MoE coder, the model cleanly resolved the tractable bug but failed three others at its ceiling in distinct ways. The clearest case: the model produced a correct diagnosis it could not itself execute and honestly returned NO-GO. That is exactly the case a larger, denser model should take over.
Density is not a blanket upgrade, though. It helps the reasoning-limited failures but hurts convergence-heavy refactors, where a slower dense model exhausts its turn budget sooner. So this is a cascade, not a swap: keep the cheap, fast model for the tractable majority and pay for the larger model only on the specific failure modes where its extra reasoning is worth the cost.
How
WorkloadSpec.EscalationCoderAgentRef(singular, unlike the pluralescalationReviewerAgentRefs): coders are sequential, so N parallel escalation coders would produce N competing branches. One tier. Unset means the feature is off, fully backward compatible.NO-GOwith top-level outcomeMODEL-DECIDED, or nestedmodelExtra.outcome == CODER-GATE-FAILED. Do not escalate on a model-decidedINCOMPLETE(the model gave up / ran out of turns),STUCK-LOOP-DETECTED, a trivial no-changes NO-GO, orERROR: those are scope/harness failures a larger, slower model will not fix and may worsen by timing out.modelExtra.outcome, not the top-levelextra.outcome, which isMODEL-DECIDEDfor the NO-GO, gate-fail, and gave-up cases alike. A naive top-level read would wrongly escalate the gave-up case. The unit truth table pins this against three real batch outcomes: a NO-GO /MODEL-DECIDED(escalate), anINCOMPLETEwith nestedCODER-GATE-FAILED(escalate), and anINCOMPLETEwith no gate outcome (do not escalate).code-<N>-esc+verify-<N>-escpair at the escalation Agent on a fresh branchforeman/<w>/issue-<N>-esc, carrying the prior model's summary. A fresh branch (not a restore of the failed attempt) keeps the high-signal insight without anchoring the strong model on the weak model's broken code, and makes the feature independent of the branch-restore machinery that is still in flight elsewhere.AgenticTaskPayload.PromptPrefix, rendered before the fetched issue body, so the escalated coder sees both the diagnosis and the issue's acceptance criteria (leavingPromptempty so the issue-body fetch still runs).emitCoderEscalations, a structural twin ofemitEscalations, wired into Reconcile before reviewer escalation (a coder that did not GO has no branch to review). One tier deep (-esctasks are never re-scanned as base tasks), idempotent (permanent step names, skip when the-escchild already exists), MaxTasks- and sovereignty-gated.AI-assisted contribution (band 3). Implemented with AI assistance under human review; human-accountable. Commits are kept clean per the project's no-attribution-in-commits convention, with the disclosure here in the PR body per CONTRIBUTING.md.
Checklist
make testpasses locally (new envtest green; controller coverage 80.2%)make lintpasses locally (0 issues; alsoGOOS=linux golangci-lint run ./...0 issues)git commit -s) per DCOdocs/site/foreman/README.md: coder escalation tier + dual-box serving)