feat(goal): cheap Haiku inner-review tier + selectable model + optional Copilot#78
Merged
Merged
Conversation
…al Copilot
Step 5b now runs a cheap inner-review pass over the working-tree diff
(default Haiku tier) before /ship's heavy gate2 cascade, keeping
implementation on the normal model and only downshifting the reviewer.
gate2 remains the authoritative gate; the Haiku pass is a fast filter.
Also tightens the 5b TDD wording (Red-Green-Refactor as an invariant, not
a forced march) and adds cost/selectability controls for /goal, reusing
/ship's existing contract rather than inventing new keys:
- goal.inner_review.{enabled,model,max_rounds} config (default haiku, 2 rounds)
- review-model=<tier> per-run flag to re-tier the inner review
- no-copilot per-run flag, forwarded to /ship, to skip the expensive
Copilot loop (review.provider stays the persistent lever)
Docs: config.md goal.* table + cost-control note, README/README.ja flag
parity. No CHANGELOG/version bump (tag-time, owned by /tag).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the /gh-issue-driven:goal milestone driver to add a low-cost “inner review” pass before delegating to /ship, along with new configuration defaults and README flag documentation to control review cost and model tier selection.
Changes:
- Add an inner-review loop in
/goalstep 5b (default Haiku tier) with max-rounds and per-run model override. - Make PR review cost controllable by forwarding
no-copilotto/shipand documenting inheritedreview.provider. - Document new
goal.inner_review.*defaults in/configand keep README flag tables in sync (EN/JA).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
README.md |
Updates /goal command synopsis to document inner-review and new flags. |
README.ja.md |
Japanese parity update for /goal synopsis and flags. |
commands/goal.md |
Adds inner-review tier, tighter TDD guidance, new flags (no-copilot, review-model=), and forwarding behavior to /ship. |
commands/config.md |
Introduces goal.inner_review.* defaults and adds cost-control documentation for /goal runs. |
| @@ -1,8 +1,8 @@ | |||
| --- | |||
| description: Phase G of gh-issue-driven — drive a whole milestone to PR. For each open issue it runs start → implement → /code-review → ship (gate2 + PR + Copilot review loop + session-summary), checkpointing to resumable state between issues. Gates on red verdicts (HITL); green/yellow auto-continue. (Full green/yellow unattended — suppressing the delegated start/ship prompts — is wired in #74.) | |||
| description: Phase G of gh-issue-driven — drive a whole milestone to PR. For each open issue it runs start → implement (TDD + a cheap Haiku inner-review pass over the diff) → ship (gate2 + PR + Copilot review loop + session-summary), checkpointing to resumable state between issues. Gates on red verdicts (HITL); green/yellow auto-continue. (Full green/yellow unattended — suppressing the delegated start/ship prompts — is wired in #74.) | |||
|
|
||
| **Inner review — cheap tier first, then fix-and-recheck.** Before handing off to `/ship`'s heavy gate2 cascade (5c), run one fast pass over the **working-tree diff** (`git diff` — no PR exists yet at this stage, so this reviews the diff directly): | ||
| - When `INNER_REVIEW_ENABLED` (default `true`): **dispatch a reviewer subagent on the `INNER_REVIEW_MODEL` tier** (default `"haiku"`; the `review-model=<tier>` flag overrides for this run) to review the diff against the issue's acceptance criteria and the repo's conventions, returning concrete, actionable findings only (no praise, no restating the diff). Fix the valid findings, re-run the relevant checks, and repeat this cheap pass until it returns no new findings — capped at `INNER_REVIEW_MAX_ROUNDS` (default `2`). Leftover findings beyond the cap are not lost: gate2 (5c) is the authoritative gate. Log `goal: inner-review (<model>) — <n> finding(s) applied, <rounds> round(s)` so the recap shows it ran. | ||
| - When `INNER_REVIEW_ENABLED` is `false`, or the requested model tier is unavailable: fall back to **`/code-review`** at an effort level matched to the change (`low` for docs/small, `medium` for features, `high`/`max` for risky or wide changes); apply its findings. |
Comment on lines
+168
to
+170
| **Controlling review cost.** Because `/goal` fans the review machinery across *every* open issue in a milestone, the Copilot loop is the dominant cost. Two independent levers, neither adding a new `goal.*` key: | ||
| - **PR review (the expensive async Copilot loop)** — optional via the inherited `review.provider`: set it to `"none"` (no automated PR review) or `"code-review"` (single-shot, no polling loop) for a persistent, cheaper default; or pass the per-run `no-copilot` flag to `/gh-issue-driven:goal`, which forwards to `/ship` as `no-copilot` for that run only. With Copilot off, each issue's PR is opened and left for manual review (recorded `needs_human`). | ||
| - **Inner review (the cheap pre-PR pass)** — `goal.inner_review.model` (default `"haiku"`) keeps it on a low-cost tier; the `review-model=<tier>` flag re-tiers it per run, and `goal.inner_review.enabled=false` removes it entirely (falling back to `/code-review`). Implementation always runs on the normal session model — only the review subagent is downshifted. |
| arguments: | ||
| - name: target | ||
| description: "The milestone to finish, e.g. 'finish milestone 0.5.0', a bare milestone title ('v0.5.0'), or a milestone number. Optional trailing flags: 'dry-run' (plan the order and print what would run, touch nothing), 'force' (treat red verdicts as yellow — fully unattended, no HITL at all), 'resume' (continue the most recent unfinished run for this milestone)." | ||
| description: "The milestone to finish, e.g. 'finish milestone 0.5.0', a bare milestone title ('v0.5.0'), or a milestone number. Optional trailing flags: 'dry-run' (plan the order and print what would run, touch nothing), 'force' (treat red verdicts as yellow — fully unattended, no HITL at all), 'resume' (continue the most recent unfinished run for this milestone), 'no-copilot' (skip the expensive Copilot PR-review loop for this run — forwarded to /ship as the per-run reviewer override), 'review-model=<tier>' (model for the cheap inner-review pass — a model alias like haiku|sonnet|opus or a full id; overrides goal.inner_review.model for this run)." |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/goalalready implemented the "short invocation, policy in the file, DoD over micro-steps" design well, so this is additive, not a rewrite. Three changes, all scoped to the review pipeline and its cost:/ship's heavy gate2 cascade, a reviewer subagent reviews the working-tree diff (git diff— correct for the pre-PR stage, no PR exists yet) on thegoal.inner_review.modeltier (defaulthaiku). Fix findings → re-check, capped atmax_rounds(default 2). Implementation stays on the normal session model; only the reviewer downshifts. gate2 remains the authoritative gate — the Haiku pass is a fast filter, never a replacement./ship's existing contract — no new Copilot keys):review.provider, or the per-runno-copilotflag (forwarded to/ship).goal.inner_review.modelor the per-runreview-model=<tier>flag.Files
commands/goal.md— step 5b rewrite, flag parsing (no-copilot,review-model=), config resolution, 5c forwarding, plan block, deps table, frontmatter.commands/config.md—goal.inner_review.*defaults +goal.*table rows + cost-control note.README.md/README.ja.md—/goalflag-table parity.Config (new keys, backward-compatible defaults)
Defaults keep Copilot on and inner-review on — the cost levers exist but aren't forced. Flip
review.provider/inner_review.enabledfor a cheaper default posture.Test results
All existing suites green, no new drift:
enum-sync-check— 33/33 in syncjq-sync-check— 5/5 in synctest_state_schema— 60/60 passcopilot-detection,test_verdict_parser— passgoal.inner_reviewblockNotes / follow-ups
/tag(the CHANGELOG is a release index).no-copilotforwarding +review-modeloverride could be added if you want coverage there./code-reviewfallback → gate2 cascade); the inner tier is deliberately the cheap filter so this doesn't become review bloat.🤖 Generated with Claude Code