Skip to content

feat(pst:sweep): v2 tournament scouting -- 3 dry-run strategies, Opus picks pipeline#71

Merged
pstaylor-patrick merged 2 commits into
mainfrom
worktree-agent-a1a10b6c76bcbb9aa
Jun 21, 2026
Merged

feat(pst:sweep): v2 tournament scouting -- 3 dry-run strategies, Opus picks pipeline#71
pstaylor-patrick merged 2 commits into
mainfrom
worktree-agent-a1a10b6c76bcbb9aa

Conversation

@pstaylor-patrick

Copy link
Copy Markdown
Owner

Summary

  • Three parallel dry-run Sonnet scouts analyze the PR batch using divergent pipeline strategies (Conservative/Standard/Review-gated) before any real execution
  • Foreground Opus judge scores on Coverage, Precision, and Efficiency using structured result-block fields
  • Winning strategy is then executed for real across all PRs

Key design decisions

  • Scouts use raw gh api only -- no sub-skill calls (fixes resolve-threads branch requirement, Strategy C circular dependency on code-review verdict)
  • Strategy C reads existing reviewDecision from Phase 1 JSON instead of running code-review during scouting
  • Phase 1 PR list injected into each scout prompt so tournament always operates on the same batch
  • Result block extended: PRs_TOTAL/IMPROVED, SIGNAL/NOISE findings counts give the judge real data for all three scoring axes
  • Non-Node guard added to Phase 2 bootstrap

Adversarial review (2 rounds, 8 findings resolved)

  • Background+await Opus judge contradiction resolved (foreground)
  • Sub-skill mutation risk eliminated (raw gh api only in scouts)
  • Judge scoring data gap filled (extended result block)
  • Phase 1 PR list injection (no stale re-discovery)
  • Non-Node repo bootstrap guard

🤖 Generated with Claude Code
https://claude.ai/code/session_019BCr5Qxi8jXnG2Rr7zdkw1

pstaylor-patrick and others added 2 commits June 21, 2026 15:58
…ies, Opus picks winner

Before executing the sweep, 3 parallel Sonnet agents scout the PR batch
in dry-run mode using divergent pipeline ordering strategies (Conservative,
Standard, Review-gated). Background Opus judge scores on coverage, precision,
and efficiency. Winning strategy is then applied for real.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Opus judge is foreground (background+await was contradictory)
- Scouts use raw gh api only; no sub-skill calls (fixes circular
  Strategy C, resolve-threads branch requirement, mutation risk)
- Strategy C uses existing reviewDecision from Phase 1 JSON
- Result block extended with PRs_TOTAL/IMPROVED, SIGNAL/NOISE counts
  so judge has real data for all three scoring axes
- Phase 1 PR list injected into each scout prompt (no re-discovery)
- Phase 2 Strategy C uses gh pr view reviewDecision after code-review
- Non-Node repo guard added to Phase 2 bootstrap
- Scout agents restricted to read-only Bash + gh api

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@pstaylor-patrick pstaylor-patrick merged commit 403c4d1 into main Jun 21, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant