[FORGE-89] feat(skills,orchestrate): rename /codex → /second-opinion; dispatch via IHarness adapter (P2-T21)#205
Merged
Merged
Conversation
… IHarness.runReview (P2-T21) New `forge orchestrate second-opinion` verb is the sole boundary that knows about settings.agents.review_host_cli. Reads diff + prompt from files, calls createHarness(host).runReview, emits a ReviewVerdict JSON envelope. The skill body (renamed in a later step) becomes host-agnostic. Decision: dispatch shape → CLI verb wrapping createHarness per user answer (skill↔verb contract — skills never own state-machine logic). Decision: CLI/env rename → defer per user answer (only printed strings change). - src/schemas/cli-args.ts: SecondOpinionArgsSchema (taskId, diffPath, promptPath, cwd?, timeoutMs?, forgeDir, json). - src/cli/orchestrate/second-opinion.ts: runOrchestrateSecondOpinion + secondOpinionHandler. Tests inject SecondOpinionHarnessFactory to avoid spawning real codex/gemini. - src/cli/orchestrate/index.ts: register handler in VERBS map + HELP_ORDER. - test/unit/cli/orchestrate/second-opinion.test.ts: 11 tests covering arg validation, missing settings/files, REVIEW_DISABLED, codex + gemini happy paths, HarnessError pass-through (BINARY_NOT_FOUND, TIMEOUT retriable, REVIEW_FAILED for non-HarnessError). All 11 tests pass; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nion verb (AC #8) Integration test exercising the full pipeline: settings load → createHarness → runReview (real CodexHarness/GeminiHarness) → parseHarnessVerdict (real fenced-JSON path) → envelope emission. Only spawnSubprocess is mocked, returning a fenced ReviewVerdict in stdout. Asserts that the two host variants produce envelopes that differ ONLY in host stamping; verdict outcome, version, and findings shape are equivalent. This is the AC #8 gate (P2-T21 plan §6). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lers (P2-T21) New skill skills/second-opinion/SKILL.md is the canonical, host-agnostic interactive surface; dispatch goes through `forge orchestrate second-opinion` (landed in earlier commit). Old skills/codex/SKILL.md becomes a description- only deprecation alias, removed in v0.5.0. Decision: /review trigger added → invokes /second-opinion review-impl when CRITICAL.md paths are touched AND review_host_cli is set (gated to respect adopters who opted out via review_host_cli: null), per user answer. Decision: deprecation alias mechanism → description-only, no chalk-warn, no per-session state, per user answer (AC #10 explicitly dropped; will be flagged in PR body). - skills/second-opinion/SKILL.md: canonical body (6-piece prompt template, CLI verb invocation, envelope shape, /ship + /review integration notes). - skills/codex/SKILL.md: ~15-line DEPRECATED redirect to /second-opinion. - skills/ship/SKILL.md: §Gates line 7 + auto-codex hint text → /second-opinion. - skills/plan-task/SKILL.md: auto-codex hint text → /second-opinion. - skills/review/SKILL.md: new step 5 — CRITICAL.md hit + review_host_cli!==null → invoke /second-opinion review-impl. Findings folded into severity bucketing. - src/cli/codex-suggest.ts: suggestion template `/codex` → `/second-opinion`. CLI verb name `forge codex-suggest`, settings field `auto_codex_enabled`, env var FORGE_AUTO_CODEX intentionally kept (rename deferred — user decision; tracked as v0.5 follow-up in plan §7). - test/unit/cli/codex-suggest.test.ts: 7 assertion strings updated. - test/unit/skills/second-opinion.contract.test.ts: 12 contract tests guarding file presence, frontmatter, deprecation alias shape, and cross- skill references. All 27 tests pass; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…P2-T21)
Updates every doc and spec reference that names the user-facing skill, but
leaves the underlying `codex.auto_codex_enabled` settings field, `forge
codex-suggest` CLI verb, and `FORGE_AUTO_CODEX` env var unchanged per the
Step 5 decision (those rename in v0.5 to avoid a settings-schema migration).
- CLAUDE.md: critical-paths comment.
- CRITICAL.md: header comments (auto-review trigger doc string).
- README.md: TASK loop diagram /codex → /second-opinion.
- spec/SPEC.md:
- Auto-codex hooks table: three rows /codex review-* → /second-opinion
review-*; example suggestion-line string updated; added note that the
CLI verb / settings field / env var keep their CODEX-suffixed names in
v0.4 per FORGE-89 plan §7.
- §824: ADR-review optional consult /codex → /second-opinion.
- §1041-1047: §Multi-host review fully rewritten — generalized prose,
names /second-opinion as canonical interactive surface, calls out
Claude exclusion + cursor drop, points at FORGE-89 / P2-T21.
- §1062: FORGE_AUTO_CODEX env doc updated to mention the v0.5 rename.
- spec/PRD.md: 4 references updated (auto-suggest comment, ADR draft
consult, ADR review-decision step, ephemeral-ADR overview).
- spec/CONTEXT.md: drift-workflow paragraph.
spec/BRIEF.md mention is intentionally retained (historical baseline doc
describing the v0.3.x skill surface, where /codex was the actual name).
spec/ORCHESTRATOR.md mentions are intentionally retained — they refer to
Codex-the-vendor-CLI (Codex CLI, Codex review #N, Codex subagent, Codex
sandbox), not the abstract reviewer role.
Build smoke: dist/bin/forge.cjs orchestrate help lists `second-opinion`
verb. All 1302 tests pass; typecheck clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…2-T21)
Code-reviewer and security-auditor passes surfaced 7 improvements; all fixed
in this commit before /ship.
CODE REVIEW:
- second-opinion.test.ts fakeHarness: drop bogus `summary` field, add
required `version: 1` so the verdict matches ReviewVerdictSchema without
needing the `as ReviewVerdict` escape cast.
- second-opinion.test.ts: add SPAWN_FAILED + NON_ZERO_EXIT retriable
assertions to lock down the verb's HarnessError → envelope mapping.
- Strengthen the existing TIMEOUT test to also assert retriable: true.
- orchestrate-second-opinion.test.ts (integration): kill the dead ternary
(`'claude' : 'claude'`) — both branches were identical; just pin to claude.
- orchestrate-second-opinion.test.ts (integration): replace the misleading
"Reset stdout for the second variant" comment with an honest explanation
of how the second captureStdout(t) call rewires the monkey-patch and
why both t.after restores are idempotent.
- skills/review/SKILL.md: clarify that /second-opinion is a Claude Code
slash command, not a Bash subprocess (prevents future skill authors from
writing Bash(/second-opinion ...) which would fail silently).
SECURITY AUDIT:
- skills/second-opinion/SKILL.md: switch the documented temp-file pattern
from predictable /tmp/forge-${TASK}-*.txt (mode 0644, world-readable) to
`mktemp -t ...` (mode 0600, unguessable name). Add explicit cleanup step
(rm -f) so committed-secret or local-path content doesn't sit in /tmp.
- second-opinion.ts: add MAX_DIFF_BYTES (1MB) + MAX_PROMPT_BYTES (200KB)
caps via statSync before readFileSync, returning DIFF_TOO_LARGE /
PROMPT_TOO_LARGE envelopes. Stops a 1GB diff from being pulled into
memory just to be rejected downstream. Precedent: MAX_STDOUT_BYTES in
src/harnesses/subprocess.ts. Two new unit tests cover the caps.
All 1306 tests pass; typecheck clean; build smoke shows the verb registered.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
firatcand
added a commit
that referenced
this pull request
May 20, 2026
Brings in FORGE-22 (gc reconciler + retry policy, PR #203) and FORGE-89 (/codex → /second-opinion rename, PR #205). Resolved conflicts in spec/CONTEXT.md + spec/PRD.md by combining our structural admonitions (deferred-to-v0.5 boxes, Decision 10/11 strikethrough+redirect, settings.yaml decisions-block annotation) with their /codex → /second-opinion renames in the preserved v0.5 prose. No semantic loss either side; both hygiene threads compose cleanly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Renames the
/codexskill to/second-opinionand routes dispatch through a newforge orchestrate second-opinionCLI verb that wrapsIHarness.runReview, sosettings.agents.review_host_cli(codex vs gemini) is honored at runtime. The user-facing surface becomes host-agnostic;/codexlives one release as a description-only deprecation alias removed in v0.5.0.forge orchestrate second-opinion— sole boundary that knows aboutreview_host_cli. Reads diff + prompt from files, callscreateHarness(host).runReview(diff, prompt, opts), emits aReviewVerdictenvelope.skills/second-opinion/SKILL.md— six-piece prompt template, calls the CLI verb, documents the envelope shape and error codes.skills/codex/SKILL.mdrewritten as a ~25-line DEPRECATED alias (description-only, no chalk-warn, no per-session marker — see "AC deviation" below)./reviewgains a step 5 — when CRITICAL.md paths are touched ANDreview_host_cli !== null, the in-process review suggests/second-opinion review-implfor out-of-process, different-model-lineage verification./codex→/second-opinionacross CLAUDE.md, README.md, CRITICAL.md, spec/{SPEC,PRD,CONTEXT}.md; new SPEC §1041 §Multi-host review rewrite naming/second-opinionas canonical surface;forge codex-suggestCLI verb /auto_codex_enabledsettings field /FORGE_AUTO_CODEXenv var intentionally kept (rename deferred to v0.5 — see "Out of scope" below).Why
With FORGE-88 (IHarness adapter) landed in #187, the
/codexskill no longer needs to hardcodecodex exec. The host-agnostic rename + adapter dispatch is the closing piece of the P2-T20–T21 sequence — it makesreview_host_cli=geminiactually do something at the interactive surface, not just in orchestrator REVIEW phase. Mirrors the established P2-T10 (/push-to-linear→/push-to-tracker) rename pattern.Decisions resolved in /plan-task (4 forks)
createHarness(host).runReviewforge codex-suggest/auto_codex_enabled/FORGE_AUTO_CODEX?.forge/settings.yamlrewrite. Filed as v0.5 follow-up./review→/second-opiniontrigger on CRITICAL.md hits?review_host_cli !== nullAC coverage
skills/second-opinion/SKILL.mdexists; names Codex + GeminiIHarness.runReviewvia settingsforge orchestrate second-opinion(skill→verb contract).skills/codex/SKILL.mdkept as deprecation alias/shipCRITICAL trigger →/second-opinion/reviewCRITICAL trigger →/second-opinionreview_host_cli !== nullso adopters who opted out aren't forced.spec/ORCHESTRATOR.mdgeneralizedtest/integration/cli/orchestrate-second-opinion.test.ts— full pipeline (settings → createHarness → real adapter → parseHarnessVerdict → envelope) with mockedspawnSubprocess.review_host_cliSETTINGS_NOT_FOUND,REVIEW_DISABLED,INVALID_SETTINGS./codexinvocation/plan-task: description-only alias, warn-every-time. Track as a follow-up only if a user complains./review findings addressed pre-PR
Code-reviewer + security-auditor caught 7 improvements; all fixed in commit
3834c72before this PR:MAX_DIFF_BYTES = 1MB+MAX_PROMPT_BYTES = 200KBcaps viastatSyncbeforereadFileSync(avoids pulling a 1GB file into memory to be rejected). Two new unit tests.mktemp -t(mode 0600, unguessable names) + explicitrm -fcleanup — eliminates/tmp/forge-${TASK}-*.txtworld-readable race.fakeHarnesstest fixture now matchesReviewVerdictSchema(drops bogussummary, adds requiredversion: 1).SPAWN_FAILED+NON_ZERO_EXIT+TIMEOUTretriable-bit assertions locked down.'claude' : 'claude') replaced + misleading "Reset stdout" comment rewritten./reviewSKILL.md line clarifies/second-opinionis a slash command, not a Bash binary.Security audit also confirmed: no argv injection (positional argv, no shell interpolation), no settings-tampering escalation (gemini-experimental gate enforced at harness ctor), no env-var-driven settings discovery redirect (gitleaks: 0 leaks across 5 commits, 56KB scanned).
Out of scope — v0.5 follow-up to file
Rename
forge codex-suggestCLI verb →forge second-opinion-suggest(or similar). Renamecodex.auto_codex_enabledsettings field →auto_second_opinion_enabled. RenameFORGE_AUTO_CODEXenv var →FORGE_AUTO_SECOND_OPINION. Settings-schema migration + adopter.forge/settings.yamlrewrite. Decision #2 in /plan-task; should be a new FORGE ticket before v0.5 cuts.How to test
npm test— 1306 pass / 0 fail / 19 skipped (was 1302 on main).npm run typecheck— clean.npm run build—node dist/bin/forge.cjs orchestratelistssecond-opinionverb in mutating band.node --test --import tsx 'test/integration/cli/orchestrate-second-opinion.test.ts'— AC [P1-T04] Build core/settings.ts loader + hot-reload #8 gate (codex/gemini equivalence).gitleaks detect --log-opts=main..HEAD— 0 leaks.agents.review_host_cli: codex(noFORGE_GEMINI_EXPERIMENTAL),forge orchestrate second-opinion --task FORGE-89 --diff <path> --prompt <path> --jsonreturns aReviewVerdictenvelope. Withreview_host_cli: nullreturnsREVIEW_DISABLED.Closes #121 (FORGE-89).
🤖 Generated with Claude Code