Skip to content

feat(goal): cheap Haiku inner-review tier + selectable model + optional Copilot#78

Merged
JFK merged 1 commit into
mainfrom
feat/goal-haiku-inner-review-optional-copilot
Jun 4, 2026
Merged

feat(goal): cheap Haiku inner-review tier + selectable model + optional Copilot#78
JFK merged 1 commit into
mainfrom
feat/goal-haiku-inner-review-optional-copilot

Conversation

@JFK

@JFK JFK commented Jun 4, 2026

Copy link
Copy Markdown
Owner

Summary

/goal already implemented the "short invocation, policy in the file, DoD over micro-steps" design well, so this is additive, not a rewrite. Three changes, all scoped to the review pipeline and its cost:

  1. Cheap inner-review tier (step 5b). Before /ship's heavy gate2 cascade, a reviewer subagent reviews the working-tree diff (git diff — correct for the pre-PR stage, no PR exists yet) on the goal.inner_review.model tier (default haiku). Fix findings → re-check, capped at max_rounds (default 2). Implementation stays on the normal session model; only the reviewer downshifts. gate2 remains the authoritative gate — the Haiku pass is a fast filter, never a replacement.
  2. Tighter TDD wording (step 5b). Red→Green→Refactor framed as an invariant, not a forced march: drive test-first where there's a test surface, keep cycles tight, and explicitly skip the ritual (no token tests, no mechanical "refactor") when there's nothing to assert.
  3. Cost/selectability controls (reusing /ship's existing contract — no new Copilot keys):
    • PR review (the expensive Copilot loop) is optional via the inherited review.provider, or the per-run no-copilot flag (forwarded to /ship).
    • Inner-review model is selectable via goal.inner_review.model or the per-run review-model=<tier> flag.

Files

  • commands/goal.md — step 5b rewrite, flag parsing (no-copilot, review-model=), config resolution, 5c forwarding, plan block, deps table, frontmatter.
  • commands/config.mdgoal.inner_review.* defaults + goal.* table rows + cost-control note.
  • README.md / README.ja.md/goal flag-table parity.

Config (new keys, backward-compatible defaults)

"goal": {
  "autonomy": "red-only",
  "max_issues_per_run": 20,
  "inner_review": { "enabled": true, "model": "haiku", "max_rounds": 2 }
}

Defaults keep Copilot on and inner-review on — the cost levers exist but aren't forced. Flip review.provider / inner_review.enabled for a cheaper default posture.

Test results

All existing suites green, no new drift:

  • enum-sync-check — 33/33 in sync
  • jq-sync-check — 5/5 in sync
  • test_state_schema — 60/60 pass
  • copilot-detection, test_verdict_parser — pass
  • Built-in-defaults JSON re-validated with the new goal.inner_review block

Notes / follow-ups

  • No CHANGELOG entry or version bump — those are tag-time, owned by /tag (the CHANGELOG is a release index).
  • No automated test added for the new flags/config (the command files are prose specs; existing tests cover schema/enum sync, which still pass). A fixture exercising no-copilot forwarding + review-model override could be added if you want coverage there.
  • Three review layers now exist (Haiku inner pass → optional /code-review fallback → gate2 cascade); the inner tier is deliberately the cheap filter so this doesn't become review bloat.

🤖 Generated with Claude Code

…al Copilot

Step 5b now runs a cheap inner-review pass over the working-tree diff
(default Haiku tier) before /ship's heavy gate2 cascade, keeping
implementation on the normal model and only downshifting the reviewer.
gate2 remains the authoritative gate; the Haiku pass is a fast filter.

Also tightens the 5b TDD wording (Red-Green-Refactor as an invariant, not
a forced march) and adds cost/selectability controls for /goal, reusing
/ship's existing contract rather than inventing new keys:

- goal.inner_review.{enabled,model,max_rounds} config (default haiku, 2 rounds)
- review-model=<tier> per-run flag to re-tier the inner review
- no-copilot per-run flag, forwarded to /ship, to skip the expensive
  Copilot loop (review.provider stays the persistent lever)

Docs: config.md goal.* table + cost-control note, README/README.ja flag
parity. No CHANGELOG/version bump (tag-time, owned by /tag).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 4, 2026 05:17
@JFK JFK merged commit 70cec0b into main Jun 4, 2026
2 checks passed
@JFK JFK deleted the feat/goal-haiku-inner-review-optional-copilot branch June 4, 2026 05:20

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the /gh-issue-driven:goal milestone driver to add a low-cost “inner review” pass before delegating to /ship, along with new configuration defaults and README flag documentation to control review cost and model tier selection.

Changes:

  • Add an inner-review loop in /goal step 5b (default Haiku tier) with max-rounds and per-run model override.
  • Make PR review cost controllable by forwarding no-copilot to /ship and documenting inherited review.provider.
  • Document new goal.inner_review.* defaults in /config and keep README flag tables in sync (EN/JA).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
README.md Updates /goal command synopsis to document inner-review and new flags.
README.ja.md Japanese parity update for /goal synopsis and flags.
commands/goal.md Adds inner-review tier, tighter TDD guidance, new flags (no-copilot, review-model=), and forwarding behavior to /ship.
commands/config.md Introduces goal.inner_review.* defaults and adds cost-control documentation for /goal runs.

Comment thread commands/goal.md
@@ -1,8 +1,8 @@
---
description: Phase G of gh-issue-driven — drive a whole milestone to PR. For each open issue it runs start → implement → /code-review → ship (gate2 + PR + Copilot review loop + session-summary), checkpointing to resumable state between issues. Gates on red verdicts (HITL); green/yellow auto-continue. (Full green/yellow unattended — suppressing the delegated start/ship prompts — is wired in #74.)
description: Phase G of gh-issue-driven — drive a whole milestone to PR. For each open issue it runs start → implement (TDD + a cheap Haiku inner-review pass over the diff) → ship (gate2 + PR + Copilot review loop + session-summary), checkpointing to resumable state between issues. Gates on red verdicts (HITL); green/yellow auto-continue. (Full green/yellow unattended — suppressing the delegated start/ship prompts — is wired in #74.)
Comment thread commands/goal.md

**Inner review — cheap tier first, then fix-and-recheck.** Before handing off to `/ship`'s heavy gate2 cascade (5c), run one fast pass over the **working-tree diff** (`git diff` — no PR exists yet at this stage, so this reviews the diff directly):
- When `INNER_REVIEW_ENABLED` (default `true`): **dispatch a reviewer subagent on the `INNER_REVIEW_MODEL` tier** (default `"haiku"`; the `review-model=<tier>` flag overrides for this run) to review the diff against the issue's acceptance criteria and the repo's conventions, returning concrete, actionable findings only (no praise, no restating the diff). Fix the valid findings, re-run the relevant checks, and repeat this cheap pass until it returns no new findings — capped at `INNER_REVIEW_MAX_ROUNDS` (default `2`). Leftover findings beyond the cap are not lost: gate2 (5c) is the authoritative gate. Log `goal: inner-review (<model>) — <n> finding(s) applied, <rounds> round(s)` so the recap shows it ran.
- When `INNER_REVIEW_ENABLED` is `false`, or the requested model tier is unavailable: fall back to **`/code-review`** at an effort level matched to the change (`low` for docs/small, `medium` for features, `high`/`max` for risky or wide changes); apply its findings.
Comment thread commands/config.md
Comment on lines +168 to +170
**Controlling review cost.** Because `/goal` fans the review machinery across *every* open issue in a milestone, the Copilot loop is the dominant cost. Two independent levers, neither adding a new `goal.*` key:
- **PR review (the expensive async Copilot loop)** — optional via the inherited `review.provider`: set it to `"none"` (no automated PR review) or `"code-review"` (single-shot, no polling loop) for a persistent, cheaper default; or pass the per-run `no-copilot` flag to `/gh-issue-driven:goal`, which forwards to `/ship` as `no-copilot` for that run only. With Copilot off, each issue's PR is opened and left for manual review (recorded `needs_human`).
- **Inner review (the cheap pre-PR pass)** — `goal.inner_review.model` (default `"haiku"`) keeps it on a low-cost tier; the `review-model=<tier>` flag re-tiers it per run, and `goal.inner_review.enabled=false` removes it entirely (falling back to `/code-review`). Implementation always runs on the normal session model — only the review subagent is downshifted.
Comment thread commands/goal.md
arguments:
- name: target
description: "The milestone to finish, e.g. 'finish milestone 0.5.0', a bare milestone title ('v0.5.0'), or a milestone number. Optional trailing flags: 'dry-run' (plan the order and print what would run, touch nothing), 'force' (treat red verdicts as yellow — fully unattended, no HITL at all), 'resume' (continue the most recent unfinished run for this milestone)."
description: "The milestone to finish, e.g. 'finish milestone 0.5.0', a bare milestone title ('v0.5.0'), or a milestone number. Optional trailing flags: 'dry-run' (plan the order and print what would run, touch nothing), 'force' (treat red verdicts as yellow — fully unattended, no HITL at all), 'resume' (continue the most recent unfinished run for this milestone), 'no-copilot' (skip the expensive Copilot PR-review loop for this run — forwarded to /ship as the per-run reviewer override), 'review-model=<tier>' (model for the cheap inner-review pass — a model alias like haiku|sonnet|opus or a full id; overrides goal.inner_review.model for this run)."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants