feat(foreman): coder-tier escalation, re-dispatch a failed coder to a larger model by Defilan · Pull Request #964 · defilantech/LLMKube

Defilan · 2026-07-03T07:45:40Z

What

Adds an opt-in coder escalation tier to Foreman's Workload controller. When a base coder task fails at its model's ceiling, the WorkloadReconciler re-dispatches that issue once to a larger coder model, carrying the failed model's own diagnosis as a prompt hint. Off by default; enabled per-Workload via a new escalationCoderAgentRef. This is the coder-side twin of the existing escalation-reviewer tier (#546).

Why

Fixes #963

A single coder model is a fixed capability ceiling. In a four-issue batch on one MoE coder, the model cleanly resolved the tractable bug but failed three others at its ceiling in distinct ways. The clearest case: the model produced a correct diagnosis it could not itself execute and honestly returned NO-GO. That is exactly the case a larger, denser model should take over.

Density is not a blanket upgrade, though. It helps the reasoning-limited failures but hurts convergence-heavy refactors, where a slower dense model exhausts its turn budget sooner. So this is a cascade, not a swap: keep the cheap, fast model for the tractable majority and pay for the larger model only on the specific failure modes where its extra reasoning is worth the cost.

How

New WorkloadSpec.EscalationCoderAgentRef (singular, unlike the plural escalationReviewerAgentRefs): coders are sequential, so N parallel escalation coders would produce N competing branches. One tier. Unset means the feature is off, fully backward compatible.
Trigger is capability failures only. Escalate iff the terminal base coder is verdict NO-GO with top-level outcome MODEL-DECIDED, or nested modelExtra.outcome == CODER-GATE-FAILED. Do not escalate on a model-decided INCOMPLETE (the model gave up / ran out of turns), STUCK-LOOP-DETECTED, a trivial no-changes NO-GO, or ERROR: those are scope/harness failures a larger, slower model will not fix and may worsen by timing out.
- The discriminator is the nested modelExtra.outcome, not the top-level extra.outcome, which is MODEL-DECIDED for the NO-GO, gate-fail, and gave-up cases alike. A naive top-level read would wrongly escalate the gave-up case. The unit truth table pins this against three real batch outcomes: a NO-GO / MODEL-DECIDED (escalate), an INCOMPLETE with nested CODER-GATE-FAILED (escalate), and an INCOMPLETE with no gate outcome (do not escalate).
Context is fresh branch plus diagnosis. On a triggering failure the reconciler emits a code-<N>-esc + verify-<N>-esc pair at the escalation Agent on a fresh branch foreman/<w>/issue-<N>-esc, carrying the prior model's summary. A fresh branch (not a restore of the failed attempt) keeps the high-signal insight without anchoring the strong model on the weak model's broken code, and makes the feature independent of the branch-restore machinery that is still in flight elsewhere.
The hint reaches the coder via a new AgenticTaskPayload.PromptPrefix, rendered before the fetched issue body, so the escalated coder sees both the diagnosis and the issue's acceptance criteria (leaving Prompt empty so the issue-body fetch still runs).
Second-pass emission hook emitCoderEscalations, a structural twin of emitEscalations, wired into Reconcile before reviewer escalation (a coder that did not GO has no branch to review). One tier deep (-esc tasks are never re-scanned as base tasks), idempotent (permanent step names, skip when the -esc child already exists), MaxTasks- and sovereignty-gated.
The controller does not manage serving. The escalation Agent must point at an already-reachable model, exactly as escalation reviewers do. The recommended deployment is dual-box with both models hot so escalation is a routing hop, not a cold model load. Documented in the Foreman runbook.

AI-assisted contribution (band 3). Implemented with AI assistance under human review; human-accountable. Commits are kept clean per the project's no-attribution-in-commits convention, with the disclosure here in the PR body per CONTRIBUTING.md.

Checklist

Tests added/updated (unit trigger truth table + step synthesis; envtest for emission, non-escalation of gave-up/stuck, backward-compat unset, idempotency)
make test passes locally (new envtest green; controller coverage 80.2%)
make lint passes locally (0 issues; also GOOS=linux golangci-lint run ./... 0 issues)
Commit messages follow conventional commits
All commits are signed off (git commit -s) per DCO
AI assistance disclosed above, per CONTRIBUTING.md
Documentation updated (docs/site/foreman/README.md: coder escalation tier + dual-box serving)

codecov · 2026-07-03T07:53:42Z

Codecov Report

❌ Patch coverage is 69.89796% with 59 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...al/foreman/controller/workload_coder_escalation.go	72.15%	33 Missing and 16 partials ⚠️
api/foreman/v1alpha1/zz_generated.deepcopy.go	0.00%	8 Missing ⚠️
internal/foreman/controller/workload_controller.go	33.33%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

joryirving

Reviewing as the operator running the outer-layer equivalent of this in production (our dispatch-bridge re-lanes attempt-exhausted issues to a frontier coder), so I checked the trigger design against our live fleet's task envelopes rather than just the diff.

Field validation of the truth table — it holds on real CRs:

wl-misospace-dispatch-525-code-525: verdict INCOMPLETE, top-level extra.outcome: MODEL-DECIDED, nested modelExtra.outcome: STUCK-LOOP-DETECTED. This is a live instance of exactly the trap the PR calls out — a naive top-level read escalates it; your discriminator correctly refuses.
wl-misospace-dispatch-528-code-528: NO-GO with top NO-CHANGES — correctly excluded.
wl-misospace-kubetix-152-code-152 and ...-action-365-code-365: NO-GO / MODEL-DECIDED — correctly escalate. The coderResultEnvelope shape matches production status.result verbatim.

Also confirmed no interaction hazard with #959: its iteration fires on a review NO-GO (coder GO'd), yours on a coder-terminal NO-GO (reviews never ran) — mutually exclusive by construction, and the code-<n>-r<k> step names can't collide with your exact-match code-<n> lookup.

Blocking question — who reviews the -esc branch? The tier emits code-<N>-esc + verify-<N>-esc only. On a Workload with reviewers (our fleet's default shape), the escalated coder can GO and gate-pass, but no review-<N>-esc exists — so no reviewer verdict, and post-#956 no openPullRequest carrier either. The escalated fix ends as a pushed, unreviewed branch with no PR, on the path where the operator has explicitly declared they want reviews. Related: what does rollup make of that issue — the base review-<N> tasks presumably cascade-failed when code-<N> failed, so does the Workload still roll up Failed even after a successful escalation? If skipping reviews on the esc tier is intentional scoping, a doc note + follow-up issue works for me and I'll approve; if not, mirroring the base review steps (with the #937 stamp) onto the esc branch looks mechanical.

Non-blocking, from the same field data: 2 of our 3 real NO-GO/MODEL-DECIDED cases were "this issue is already resolved by " honest bails. Under this design both burn an escalation run for the larger model to re-confirm staleness. Not this PR's problem — but it suggests a future dedicated outcome (e.g. ALREADY-RESOLVED) so the honest-bail class can split into "couldn't do it" (escalate) vs "nothing to do" (close/flag). Happy to file it upstream with the CR evidence if useful.

We'll also adopt this taxonomy in our bridge's outer tier — attempt-counting escalated our PUSH-FAILED/harness failures to the big model, which your trigger correctly never would. The two layers compose nicely: in-Workload capability escalation first, lane-level re-dispatch as the outer loop.

Groundwork for the coder escalation tier: an opt-in larger-model coder that re-attempts an issue when the base coder fails, and a prompt-prefix field to carry the prior model's diagnosis alongside the issue body. Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>

Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>

Escalate only capability failures (model NO-GO or CODER-GATE-FAILED), never a model-decided INCOMPLETE / stuck-loop / ERROR. Reads the typed verdict plus the nested modelExtra.outcome, since the top-level extra.outcome is MODEL-DECIDED for all model-terminated runs alike. Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>

… failure Second-pass hook mirroring emitEscalations, wired before reviewer escalation. Emits code-<N>-esc/verify-<N>-esc at EscalationCoderAgentRef with the prior model's diagnosis as a prompt prefix; bounded to one tier, idempotent, MaxTasks- and sovereignty-gated. Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>

The topOutcome param is kept configurable to mirror the executor envelope shape even though current cases all pass MODEL-DECIDED, matching the existing pressure_test.go convention. Clears the make lint gate. Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>

Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>

A base coder capability-failure re-dispatched to the escalation tier emitted only code-<N>-esc and verify-<N>-esc, so the escalated coder could GO and gate-pass but nothing produced a reviewer verdict or the openPullRequest carrier: the fix ended as a pushed, unreviewed branch with no PR. coderEscalationSteps now also appends one review-<N>-esc- per ReviewerAgentRef, dependent on verify-<N>-esc, carrying the esc branch and mirroring the base review step's openPullRequest computation. A Workload with no reviewers is unchanged (code+verify only). Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>

…ists When a base code-<N> capability-failure was escalated, the base verify-<N>/review-<N>- that cascade-failed stayed in the rollup slice, so a SUCCESSFUL escalation still rolled the Workload up Failed: the issue was judged by the dead base attempt, not the esc attempt. activeChildren now drops the base-attempt synthesized steps for an issue once a code-<N>-esc task exists, alongside the existing fix-iteration supersession. The esc steps themselves never parse as base issue steps, so the escalation attempt is never dropped by its own rule; unescalated issues are untouched. emitCoderEscalations labels its in-flight placeholders so the supersession applies in the same reconcile pass, matching emitReviewIterations. The envtest drives a base NO-GO plus its cascade-failed verify/review through a full escalation to on-target success and asserts the Workload reaches Completed rather than Failed. Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>

…on scan The coder-escalation tier emits review-<N>-esc- steps, which carry the review-<N>- prefix that the reviewer-escalation tier (defilantech#546) scans as base reviews. When both tiers are enabled on a Workload, a NO-GO on an escalated review would fan escalate-<N>-<j> steps against the base branch instead of the escalated one. Exclude review-<N>-esc- from the base-review scan so the two tiers compose. Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>

Defilan · 2026-07-03T20:24:41Z

Thanks for the field-validation against live CRs, that is exactly the confidence I wanted on the trigger, and the #959 mutual-exclusivity check is reassuring.

Addressed your blocking point (full mirror, not scoped out) plus one interaction I found along the way. Rebased onto current main (your #959/#967/#956 all landed).

Reviewers on the escalated branch. coderEscalationSteps now emits review-<N>-esc- per ReviewerAgentRef, DependsOn: verify-<N>-esc, on the esc branch, with the openPullRequest stamp computed the same way the base does. No reviewers configured means code+verify only, unchanged.

Rollup. You were right to ask. I verified the semantics rather than assume: a base code-<N> NO-GO ends Succeeded/NoGo (counts incomplete), and the cascade-failed verify-<N>/review-<N>- count failed, so without intervention a successful escalation still rolls up Failed. Extended activeChildren so that once code-<N>-esc exists, the base attempt for N is superseded and dropped from the rollup slice. The envtest proves it: with the fix the escalated issue reaches Completed, and reverting the supersession forces Failed (leftover base incomplete+failed).

Cross-tier guard (found while wiring the above). escalationSteps scans review-<N>- by prefix, which also matches my new review-<N>-esc-. With both EscalationReviewerAgentRefs and EscalationCoderAgentRef set (your fleet's shape), a NO-GO on an escalated review would have fanned escalate-<N>-<j> against the base branch. Added an exclusion so the esc reviews are skipped by the base scan; unit-tested.

All green: unit + envtest, make test, lint both arches, no CRD drift.

On the non-blocking ALREADY-RESOLVED idea: agreed, the "nothing to do" honest-bail is a distinct class from "couldn't do it," and splitting it would stop escalation burning a run to re-confirm staleness. Please do file it upstream with the CR evidence, that is a clean follow-up and your live data makes the case. And yes, the two-layer composition (in-Workload capability escalation, then your lane-level re-dispatch) is exactly the shape I was hoping for.

Ready for another look.

joryirving

Approve. Traced the three changes against the parsers rather than the descriptions; all correct.

Reviewer fan-out. review-<N>-esc- per ref, DependsOn: verify-<N>-esc, openPullRequest computed identically to the base — resolves the unreviewed-branch blocker. Code+verify-only when no reviewers, unchanged.

Supersession. Verified the matcher does exactly what's needed and nothing more: issueStepIteration matches every base shape (code-N/verify-N via the step==base arm, review-N- via the digit-checked review parser) so escalated[n] drops the base coder's NO-GO and its cascade-failed verify/review — but rejects all three -esc shapes (verify-N-esc misses the -r prefix; review-N-esc- fails reviewIterationOf's digit loop on the e of esc), so the escalation attempt's own verify/review stay in the rollup. The same-pass labeled placeholders keep the Workload in-flight while esc runs, so no premature Completed. Confirmed the placeholder labelStep (TrimPrefix(name, w.Name+"-")) equals the real task's step.Name under absoluteTaskName.

Cross-tier guard. Switch order puts review-N-esc- before review-N-, so esc reviews are skipped by the base reviewer-escalation scan while review-N- still counts. Correct.

One non-blocking edge (inherited, not introduced here). coderEscalationSteps gates the whole issue on existingEsc[code-<N>-esc], so if a reconcile creates code-<N>-esc but a transient API error aborts before review-<N>-esc- (renderAndCreate creates in order, returns on the first non-AlreadyExists error), the next pass sees code-<N>-esc exists → continue → the missing review-esc is never re-proposed, and the branch lands unreviewed again — the exact class this commit closes. Low probability, and it's the same issue-level dedup escalationSteps (#546) uses. Since renderAndCreate already skips AlreadyExists, keying the skip per-step (or emitting all esc steps every pass and letting the create dedup) would make it self-healing. Fine as a follow-up.

Filing the ALREADY-RESOLVED outcome issue upstream with the CR evidence as agreed.

…27 ➔ 0.8.28) (#1405) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/defilantech/charts/llmkube](https://github.com/defilantech/LLMKube) | patch | `0.8.27` → `0.8.28` | --- ### Release Notes <details> <summary>defilantech/LLMKube (ghcr.io/defilantech/charts/llmkube)</summary> ### [`v0.8.28`](https://github.com/defilantech/LLMKube/blob/HEAD/CHANGELOG.md#0828-2026-07-04) [Compare Source](defilantech/LLMKube@v0.8.27...v0.8.28) ##### Features - **foreman:** bounded fix iteration on reviewer NO-GO instead of terminal failure ([#959](defilantech/LLMKube#959)) ([d820fff](defilantech/LLMKube@d820fff)) - **foreman:** coder-tier escalation, re-dispatch a failed coder to a larger model ([#964](defilantech/LLMKube#964)) ([ce8f655](defilantech/LLMKube@ce8f655)) - **foreman:** executor-owned revise-from-branch restore for revision tasks ([#967](defilantech/LLMKube#967)) ([b76051c](defilantech/LLMKube@b76051c)) - **foreman:** open the pull request on review GO ([#956](defilantech/LLMKube#956)) ([fd852e1](defilantech/LLMKube@fd852e1)) - **inference:** add spec.modelCache.claimName for user-owned cache PVCs ([#960](defilantech/LLMKube#960)) ([aab5a58](defilantech/LLMKube@aab5a58)) ##### Bug Fixes - **foreman:** accept workspace-internal absolute paths in resolveInside ([#957](defilantech/LLMKube#957)) ([34b126c](defilantech/LLMKube@34b126c)) - **foreman:** defer generic self-gate when runtime is missing from coder image ([#958](defilantech/LLMKube#958)) ([df185ec](defilantech/LLMKube@df185ec)) - **foreman:** reject no-op str\_replace where old\_string equals new\_string ([#969](defilantech/LLMKube#969)) ([c71f38b](defilantech/LLMKube@c71f38b)) - **foreman:** scope-overlap check catches Go files in new directories ([#962](defilantech/LLMKube#962)) ([486a944](defilantech/LLMKube@486a944)) - **inference:** warn when modelCache.claimName is silently ignored ([#966](defilantech/LLMKube#966)) ([d49cd22](defilantech/LLMKube@d49cd22)) ##### Documentation - register the Karpenter GPU autoscaling guide in nav.yaml ([#954](defilantech/LLMKube#954)) ([88b9c7d](defilantech/LLMKube@88b9c7d)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).  Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/1405

Defilan mentioned this pull request Jul 3, 2026

[BUG] Foreman str_replace reports success on a no-op edit (old_string == new_string), driving a RepeatedToolCall loop #968

Closed

Defilan requested a review from joryirving July 3, 2026 19:31

joryirving suggested changes Jul 3, 2026

View reviewed changes

Defilan added 9 commits July 3, 2026 12:58

feat(foreman): render payload PromptPrefix before the issue body

a883be2

Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>

docs(foreman): document escalationCoderAgentRef + dual-box serving

68a78b7

Refs defilantech#963 Signed-off-by: Christopher Maher <chris@mahercode.io>

Defilan force-pushed the feat/coder-escalation-tier branch from b8bc040 to b5a9c82 Compare July 3, 2026 20:24

Defilan requested a review from joryirving July 3, 2026 20:24

joryirving approved these changes Jul 3, 2026

View reviewed changes

joryirving mentioned this pull request Jul 3, 2026

[FEATURE] Foreman: distinct ALREADY-RESOLVED coder outcome (don't escalate the honest 'already done' bail) #970

Open

Defilan merged commit ce8f655 into defilantech:main Jul 3, 2026
24 checks passed

github-actions Bot mentioned this pull request Jul 3, 2026

chore: release 0.8.28 #955

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(foreman): coder-tier escalation, re-dispatch a failed coder to a larger model#964

feat(foreman): coder-tier escalation, re-dispatch a failed coder to a larger model#964
Defilan merged 9 commits into
defilantech:mainfrom
Defilan:feat/coder-escalation-tier

Defilan commented Jul 3, 2026

Uh oh!

codecov Bot commented Jul 3, 2026 •

edited

Loading

Uh oh!

joryirving left a comment

Uh oh!

Defilan commented Jul 3, 2026

Uh oh!

joryirving left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

Defilan commented Jul 3, 2026

What

Why

How

Checklist

Uh oh!

codecov Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

joryirving left a comment

Choose a reason for hiding this comment

Uh oh!

Defilan commented Jul 3, 2026

Uh oh!

joryirving left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jul 3, 2026 •

edited

Loading