From 6e8cd06e2ee94edbfdea6b4e84d8c6112f03bbd8 Mon Sep 17 00:00:00 2001 From: Patrick Taylor <1963845+pstaylor-patrick@users.noreply.github.com> Date: Sun, 21 Jun 2026 15:58:35 -0500 Subject: [PATCH 1/2] feat(pst:sweep): v2 tournament scouting -- 3 dry-run pipeline strategies, Opus picks winner Before executing the sweep, 3 parallel Sonnet agents scout the PR batch in dry-run mode using divergent pipeline ordering strategies (Conservative, Standard, Review-gated). Background Opus judge scores on coverage, precision, and efficiency. Winning strategy is then applied for real. Co-Authored-By: Claude Sonnet 4.6 --- skills/pst:sweep/SKILL.md | 536 ++++++++++++-------------------------- 1 file changed, 167 insertions(+), 369 deletions(-) diff --git a/skills/pst:sweep/SKILL.md b/skills/pst:sweep/SKILL.md index df79bec..3869775 100644 --- a/skills/pst:sweep/SKILL.md +++ b/skills/pst:sweep/SKILL.md @@ -1,13 +1,14 @@ --- name: pst:sweep -description: Parallel quality sweep across open PRs -- resolve threads, code review, and QA in isolated worktrees, with optional author/label filtering -argument-hint: "[--author ] [--label ] [--cap N] [--dry-run] [--stages resolve,review,qa]" +description: Parallel quality sweep across open PRs -- tournament scouting picks the best pipeline strategy before executing for real +argument-hint: "[--author ] [--label ] [--cap N] [--dry-run] [--stages resolve,review,qa] [--no-drafts]" allowed-tools: Bash, Read, Edit, Write, Grep, Glob, Agent, AskUserQuestion, Skill --- -# Sweep -- Parallel Quality Pipelines Across Open PRs +# Sweep -- Tournament-Scouted Quality Pipelines Across Open PRs -Orchestrator that discovers open PRs, then fans out parallel quality pipelines (resolve-threads, code-review, QA) across all of them in isolated worktrees. Each PR gets its own agent. Filter by author, label, or any combination. +Discovers open PRs, runs a best-of-3 dry-run tournament to select the optimal pipeline +strategy, then executes the winner for real across all PRs in isolated worktrees. --- @@ -15,393 +16,228 @@ Orchestrator that discovers open PRs, then fans out parallel quality pipelines ( #$ARGUMENTS -**Parse arguments:** - -- `--author ` -- only sweep PRs authored by this GitHub user (case-insensitive match) +- `--author ` -- only sweep PRs by this GitHub user (case-insensitive) - `--label ` -- only sweep PRs with this label -- `--cap N` -- override concurrency cap (max parallel pipelines, default 3) +- `--cap N` -- concurrency cap (default 3, max 10) - `--dry-run` -- discover and classify PRs but do not run pipelines -- `--stages ` -- comma-separated pipeline stages to run (default: `resolve,review,qa`). Valid stages: `resolve`, `review`, `qa` -- `--no-drafts` -- exclude draft PRs (default: drafts ARE included) -- No arguments -- sweep all open PRs with default pipeline - -Multiple filters combine as AND (e.g., `--author pat --label bug` = PRs by pat with label bug). +- `--stages ` -- comma-separated subset of `resolve`, `review`, `qa` (default: all three) +- `--no-drafts` -- exclude draft PRs (default: drafts included) -Examples: - -- `/pst:sweep` -- all open PRs, full pipeline -- `/pst:sweep --author pstaylor-patrick` -- only my PRs -- `/pst:sweep --label "needs review"` -- only PRs with that label -- `/pst:sweep --stages review,qa` -- skip resolve-threads -- `/pst:sweep --cap 2 --dry-run` -- preview what would be swept -- `/pst:sweep --no-drafts` -- skip draft PRs +Multiple filters combine as AND. No arguments sweeps all open PRs. --- ## Phase 0: Guards & Config ```bash -BRANCH=$(git branch --show-current 2>/dev/null) -DEFAULT_BRANCH=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@') -DEFAULT_BRANCH="${DEFAULT_BRANCH:-main}" OWNER_REPO=$(gh repo view --json nameWithOwner --jq .nameWithOwner) OWNER=$(echo "$OWNER_REPO" | cut -d/ -f1) REPO=$(echo "$OWNER_REPO" | cut -d/ -f2) +DEFAULT_BRANCH=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@') +DEFAULT_BRANCH="${DEFAULT_BRANCH:-main}" ``` -| Condition | Action | -| ---------------------- | ------------------------------------ | -| `gh` not available | Stop: "GitHub CLI (gh) is required." | -| `gh auth status` fails | Stop: "Run `gh auth login` first." | - -**Parse config:** - -``` -CONCURRENCY_CAP = args --cap N || 3 -STAGES = args --stages || "resolve,review,qa" -AUTHOR_FILTER = args --author || "" -LABEL_FILTER = args --label || "" -INCLUDE_DRAFTS = !(args --no-drafts) -DRY_RUN = args --dry-run -``` - -Validate: +Stop if `gh` is unavailable or `gh auth status` fails. -- `CONCURRENCY_CAP` must be a positive integer, max 10 -- `STAGES` must be a comma-separated subset of: `resolve`, `review`, `qa` +Parse: `CONCURRENCY_CAP` (default 3), `STAGES` (default `resolve,review,qa`), `AUTHOR_FILTER`, +`LABEL_FILTER`, `INCLUDE_DRAFTS`, `DRY_RUN`. Validate cap is 1-10 and stages are a valid subset. --- ## Phase 1: PR Discovery -**1.1 Fetch open PRs** +**Fetch:** ```bash -gh pr list --state open --json number,title,headRefName,baseRefName,url,isDraft,author,labels,reviewDecision --limit 100 +gh pr list --state open \ + --json number,title,headRefName,baseRefName,url,isDraft,author,labels,reviewDecision \ + --limit 100 ``` -Each PR has: `{number, title, headRefName, baseRefName, url, isDraft, author.login, labels[].name, reviewDecision}` - -**1.2 Apply filters** - -Starting from the full list, apply filters sequentially: +**Filter:** Apply `--no-drafts`, `--author`, `--label` as AND. If nothing remains: print +"Nothing to sweep -- no open PRs match the filters." and exit cleanly. -1. **Draft filter:** If `--no-drafts`, remove PRs where `isDraft=true`. Otherwise keep all. -2. **Author filter:** If `--author` provided, keep only PRs where `author.login` matches (case-insensitive). -3. **Label filter:** If `--label` provided, keep only PRs where any label name matches (case-insensitive). +**Classify:** For each PR, if `resolve` is in STAGES query unresolved thread count via GraphQL +`reviewThreads`. Store `unresolvedThreads` and `reviewDecision`. -**1.3 Classify PRs** - -For each remaining PR, classify: - -- **Has unresolved threads:** Query each PR for unresolved review threads (only if `resolve` is in STAGES): - - ```bash - gh api graphql -f query=' - { - repository(owner: "'$OWNER'", name: "'$REPO'") { - pullRequest(number: '$PR_NUMBER') { - reviewThreads(first: 1) { - nodes { isResolved } - totalCount - } - } - } - }' - ``` - - Count unresolved threads. Store as `unresolvedThreads` on the PR. - -- **Review status:** From `reviewDecision` -- `APPROVED`, `CHANGES_REQUESTED`, `REVIEW_REQUIRED`, or empty. - -**1.4 Present discovery** - -Use **AskUserQuestion**: +**Confirm (AskUserQuestion):** ``` SWEEP DISCOVERY --------------- + # PR Author Title Draft Threads Review + 1 #42 pstaylor Add login flow no 3 CHANGES_REQUESTED + 2 #45 pstaylor Fix auth redirect no 0 REVIEW_REQUIRED - # PR Author Title Draft Threads Review - 1 #42 pstaylor Add login flow no 3 CHANGES_REQUESTED - 2 #45 pstaylor Fix auth redirect no 0 REVIEW_REQUIRED - 3 #48 contributor Update docs yes 1 -- +Filters: {author/label/none} Stages: {resolve -> review -> qa} -Filters: {author: pstaylor | label: bug | none} -Pipeline: {resolve -> review -> qa} -Concurrency: {3} - -Options: 1. Sweep all (Recommended) -2. Select specific items (list numbers) -3. Remove items (list numbers to exclude) +2. Select specific items +3. Remove items 4. Abort ``` -If "Select specific" or "Remove": prompt for numbers, filter list, re-confirm. - -**If no items found after filtering:** Display "Nothing to sweep -- no open PRs match the filters." and exit cleanly. - -**If `--dry-run`:** Display the discovery table and exit. Do not proceed to Phase 2. +If `--dry-run`: display table and exit. If no items after selection: exit cleanly. --- -## Phase 2: Spawn Parallel Pipelines - -**2.1 Batch into waves** - -Split selected PRs into batches of `CONCURRENCY_CAP`. If 8 PRs and cap=3: wave 1 (items 1-3), wave 2 (items 4-6), wave 3 (items 7-8). - -**2.2 Spawn wave** - -For each PR in the current wave, launch a background agent with worktree isolation. - -**Important ordering:** Stages run sequentially within each pipeline in the order: `resolve` -> `review` -> `qa`. This order matters because: - -- `resolve` pushes code fixes and clears threads before review -- `review` posts findings before QA validates behavior -- `qa` runs last against the most up-to-date branch state - -**Agent prompt per PR:** - -```` -Agent: - description: "Sweep PR #N: {title}" - isolation: worktree - run_in_background: true - prompt: | - You are running a sweep quality pipeline for PR #$PR_NUMBER in $OWNER/$REPO. - - The PR branch is: $HEAD_BRANCH (base: $BASE_BRANCH) - PR URL: $PR_URL - - First, check out the PR branch: - ```bash - gh pr checkout $PR_NUMBER - ``` - - Then bootstrap the worktree so pre-commit/pre-push hooks and quality - gates can actually run (env symlinks, deps, husky hooks path): - ```bash - if [ -f pnpm-lock.yaml ]; then PKG="pnpm" - elif [ -f yarn.lock ]; then PKG="yarn" - else PKG="npm"; fi - if grep -q '"worktree:init"' package.json 2>/dev/null; then - $PKG run worktree:init - fi - ``` - If the repo doesn't define `worktree:init`, fall back to - `$PKG install --frozen-lockfile`. Without this step, downstream - stages will misreport (missing env files masquerade as real - failures) and pre-push hooks will fail. - - Run these stages SEQUENTIALLY. Run ALL stages to completion - -- do NOT short-circuit on failure. Record results for each stage. - - [STAGE: resolve-threads -- INCLUDE IF "resolve" IN STAGES] - Stage 1: Resolve Threads - Run: Skill("pst:resolve-threads", "$PR_NUMBER") - Parse the --- RESOLVE RESULT --- block from the output. - Record: total conversations, processed, fixed, threads resolved. - If the skill pushed fixes, note the new HEAD commit. - - [STAGE: code-review -- INCLUDE IF "review" IN STAGES] - Stage 2: Code Review - Run: Skill("pst:code-review", "$PR_NUMBER") - This runs in standard GitHub PR mode -- posts a review to the PR. - Record: verdict (APPROVED / CHANGES_REQUESTED / COMMENT), finding count, critical count. - - [STAGE: qa -- INCLUDE IF "qa" IN STAGES] - Stage 3: QA - Run: Skill("pst:qa", "$PR_NUMBER") - This runs fully autonomously. Record the QA result. - - After ALL stages complete, output this block EXACTLY: - - --- SWEEP PIPELINE RESULT --- - pr: #$PR_NUMBER - title: $PR_TITLE - author: $PR_AUTHOR - type: pre-merge - - resolve-threads: - status: [COMPLETED|SKIPPED|ERROR] - total-conversations: N - processed: N - fixed: N - threads-resolved: N - pushed: [yes|no] - - code-review: - status: [COMPLETED|SKIPPED|ERROR] - verdict: [APPROVED|CHANGES_REQUESTED|COMMENT|--] - findings: N - critical: N - - qa: - status: [COMPLETED|SKIPPED|ERROR] - overall: [PASSED|FAILED|PARTIAL|SKIPPED] - test-cases-total: N - test-cases-passed: N - test-cases-failed: N - - pipeline-overall: [PASSED|FAILED|ERROR] - --- END SWEEP PIPELINE RESULT --- -```` - -Only include stages that are in `STAGES`. For omitted stages, output `status: SKIPPED` in the result block. - -**2.3 Monitor wave progress** - -As each background agent completes (orchestrator is notified automatically), parse the `SWEEP PIPELINE RESULT` block from its result. Display incremental progress: +## Tournament Gate -``` -SWEEP PROGRESS --------------- - -Wave 1 of 2: - - # PR Author Resolve Review QA Overall - 1 #42 pstaylor 3 fixed APPROVED 3/3 PASS PASSED - 2 #45 pstaylor 0 threads CHANGES (1c) RUNNING -- - 3 #48 contrib SKIPPED COMMENT 2/4 FAIL FAILED -``` +**Skip condition (1-2 PRs):** Skip tournament, set `WINNING_STRATEGY=B`, jump to Phase 2. -After all agents in a wave complete, start the next wave. +Otherwise, ask (AskUserQuestion): -**2.4 Handle agent failures** +> Scout with N=3 strategies before executing? (Yes / Execute Standard / Skip sweep) -If an agent crashes or returns without a valid `SWEEP PIPELINE RESULT` block: - -- Record the item as `ERROR` with the agent's error output -- Continue with remaining items -- Include in triage report +- **Yes:** run Phase T. +- **Execute Standard:** set `WINNING_STRATEGY=B`, skip Phase T. +- **Skip sweep:** exit cleanly. --- -## Phase 3: Triage Report +## Phase T: Dry-Run Scout Tournament -**3.1 Generate report** +Spawn **3 foreground Sonnet agents** in a single response turn (NO `run_in_background`). +All three must complete before the judge. Each reads PR state via `gh pr view --json` and +`gh pr checks` only -- NO mutations, NO comment posts, NO pushes. -Create a markdown report at `/tmp/pst-sweep-$(date +%Y%m%d-%H%M%S).md`: +### Strategy A -- Conservative -```markdown -# Sweep Report -- [date] +Simulate: inspect all PRs for resolvable threads via read-only `pst:resolve-threads`. +Skip code-review and qa entirely. Report which threads are fixable, which PRs have merge +conflicts, and which would benefit most from thread cleanup. -## Summary +### Strategy B -- Standard -| Metric | Count | -| --------- | ----- | -| Total PRs | N | -| Passed | N | -| Failed | N | -| Errors | N | +Simulate per PR: resolve -> code-review -> qa sequentially. Skip drafts and conflict-blocked +PRs. Report projected findings for each stage across all PRs. -Filters: {author: X | label: Y | none} -Stages: {resolve, review, qa} +### Strategy C -- Review-gated -## Results +Simulate: run code-review inspection first per PR. Only plan qa for PRs code-review would +mark APPROVED. Skip qa entirely for CHANGES_REQUESTED PRs to save compute. Report which PRs +gate out and why. -### PR #42 -- Add login flow (pstaylor) -- PASSED +### Required result block (each agent) -**Resolve Threads:** 3 conversations processed, 3 fixed, pushed -**Code Review:** APPROVED -- 0 findings -**QA:** 3/3 test cases passed +``` +---sweep-result--- +STRATEGY: +PRs_PROCESSABLE: +PRs_BLOCKED: +ESTIMATED_MUTATIONS: +FINDINGS_SUMMARY: +---end-sweep-result--- +``` --- -### PR #48 -- Update docs (contributor) -- FAILED - -**Resolve Threads:** SKIPPED (--no-drafts excluded, or not in stages) -**Code Review:** COMMENT -- 2 findings (0 critical) -**QA:** 2/4 test cases passed -- 2 FAILED - -- TC-2: Broken link in sidebar -- FAIL -- TC-4: Image alt text missing -- FAIL +## Phase T.2: Opus Judge + +Parse all three `---sweep-result---` blocks. Spawn one **background Opus agent** +(`model: opus`) and **await its result before Phase T.3**. Prompt: + +> Score each strategy (1-5 per axis): +> +> - **Coverage**: how many PRs does this strategy actually improve vs skip? +> - **Precision**: ratio of signal findings to noise; review-gated strategies score higher +> when they block qa for CHANGES_REQUESTED PRs. +> - **Efficiency**: total mutations per PR improved (fewer is better). +> +> Return JSON only: +> +> ```json +> { +> "winner": "A|B|C", +> "scores": { +> "A": { "coverage": 0, "precision": 0, "efficiency": 0 }, +> "B": { "coverage": 0, "precision": 0, "efficiency": 0 }, +> "C": { "coverage": 0, "precision": 0, "efficiency": 0 } +> }, +> "reasoning": "one sentence" +> } +> ``` + +Set `WINNING_STRATEGY` from `winner`. Log scores and reasoning before proceeding. --- -[...per-PR sections...] -``` - -**3.2 Interactive triage (only if failures exist)** +## Phase 2: Execute Winning Strategy -If ALL items passed -> skip triage, display "All clear" summary, jump to Phase 4. +Announce the winning strategy and Opus reasoning. Run the real (non-dry-run) sweep. -For each failed/errored PR, present options via **AskUserQuestion**: +**2.1 Batch into waves:** Split selected PRs into groups of `CONCURRENCY_CAP`. Start +wave 2 only after wave 1 completes. -**For failed PRs:** +**2.2 Pipeline per strategy:** -``` -PR #48 -- Update docs (contributor) -- FAILED +- **Strategy A:** one background agent per PR running `pst:resolve-threads` only. +- **Strategy B:** one background agent per PR running resolve -> code-review -> qa (skip + drafts and conflict-blocked PRs). +- **Strategy C:** one background agent per PR running code-review first; include qa only + if code-review result is APPROVED. -Resolve: -- | Review: 2 findings | QA: 2/4 failed +Each agent uses `isolation: worktree`. Bootstrap before running stages: -Options: -1. Fix now -- spawn an agent to address findings and push -2. Post summary comment on PR -3. View full details -4. Skip -- acknowledge and move on +```bash +gh pr checkout $PR_NUMBER +if [ -f pnpm-lock.yaml ]; then PKG="pnpm" +elif [ -f yarn.lock ]; then PKG="yarn" +else PKG="npm"; fi +if grep -q '"worktree:init"' package.json 2>/dev/null +then $PKG run worktree:init +else $PKG install --frozen-lockfile; fi ``` -Option behaviors: - -- **Fix now** -- launch an Agent with `isolation: worktree`: - - ```` - Agent: - description: "Fix sweep findings for PR #N" - isolation: worktree - prompt: | - Address these findings from the sweep pipeline for PR #$PR_NUMBER: +Required result block per PR agent: - Code Review Findings: - [list findings] +``` +--- SWEEP PIPELINE RESULT --- +pr: #$PR_NUMBER +title: $PR_TITLE +resolve-threads: status: [COMPLETED|SKIPPED|ERROR] | fixed: N | pushed: [yes|no] +code-review: status: [COMPLETED|SKIPPED|ERROR] | verdict: [APPROVED|CHANGES_REQUESTED|COMMENT|--] | findings: N +qa: status: [COMPLETED|SKIPPED|ERROR] | overall: [PASSED|FAILED|PARTIAL|SKIPPED] +pipeline-overall: [PASSED|FAILED|ERROR] +--- END SWEEP PIPELINE RESULT --- +``` - QA Failures: - [list QA failures] +Run ALL stages to completion -- do NOT short-circuit on failure. - Check out the PR branch, fix each issue, commit changes, and push. - ```bash - gh pr checkout $PR_NUMBER - ``` - ```` +**2.3 Progress display** as agents complete: -- **Post comment** -- post a summary of all pipeline findings as a PR comment: +``` +SWEEP PROGRESS (Strategy B -- Standard) + # PR Resolve Review QA Overall + 1 #42 3 fixed APPROVED 3/3 PASS PASSED + 2 #45 RUNNING -- -- -- +``` - ```bash - gh pr comment $PR_NUMBER --body "$(cat <<'EOF' - ## Sweep Results +**2.4 Agent failures:** Record as ERROR, continue remaining agents, include in triage. - **Pipeline: FAILED** +--- - [stage summaries] - EOF - )" - ``` +## Phase 3: Triage Report -- **View details** -- display the full section from the sweep report -- **Skip** -- record decision and move on +Write `/tmp/pst-sweep-$(date +%Y%m%d-%H%M%S).md`. If all passed: "All clear", jump to +Phase 4. -**For errored items:** +For each failed/errored PR (AskUserQuestion per item): ``` -PR #50 -- Feature X -- ERROR - -Agent crashed: [error excerpt] +PR #48 -- FAILED | Review: 2 findings | QA: 2/4 failed -Options: -1. Retry pipeline -2. Skip +1. Fix now (spawn worktree agent to address findings and push) +2. Post summary comment on PR +3. View full details +4. Skip ``` -If "Retry": re-launch the agent for that item. - -**3.3 Clean up report** +- **Fix now:** Agent with `isolation: worktree`, `gh pr checkout`, address findings, push. +- **Post comment:** `gh pr comment $PR_NUMBER --body "..."` with stage summaries. +- **Retry** (error items): re-launch agent for that PR. -```bash -rm -f "$REPORT_PATH" -``` +Clean up: `rm -f "$REPORT_PATH"`. --- @@ -410,72 +246,34 @@ rm -f "$REPORT_PATH" ``` --- SWEEP RESULT --- timestamp: [ISO 8601] -filters: {author: X, label: Y | none} -stages: {resolve, review, qa} -items-total: N -items-passed: N -items-failed: N -items-errored: N +strategy: [A|B|C] ([Conservative|Standard|Review-gated]) +items-total: N | items-passed: N | items-failed: N | items-errored: N results: - - pr: #42 - author: pstaylor - pipeline: PASSED - resolve: 3 fixed - review: APPROVED - qa: 3/3 - - - pr: #45 - author: pstaylor - pipeline: PASSED - resolve: 0 threads - review: APPROVED - qa: 5/5 - - - pr: #48 - author: contributor - pipeline: FAILED - resolve: SKIPPED - review: COMMENT (2 findings) - qa: 2/4 - triage: fix-now - + - pr: #42 | pipeline: PASSED | resolve: 3 fixed | review: APPROVED | qa: 3/3 + - pr: #45 | pipeline: FAILED | resolve: 0 | review: COMMENT (2) | qa: 2/4 --- END SWEEP RESULT --- ``` --- -## Edge Cases - -| Scenario | Action | -| ------------------------------ | ---------------------------------------------------------------------- | -| No open PRs | "Nothing to sweep" message, exit cleanly | -| All PRs pass | Skip triage walk-through, "All clear" summary | -| Agent crash | Record ERROR, continue other agents, include in triage | -| `gh` not authenticated | Abort with: "Run `gh auth login` first" | -| Concurrency cap exceeded | Batch into waves (Phase 2.1) | -| User cancels mid-triage | Partial results, still emit SWEEP RESULT | -| PR has no review threads | `resolve` stage completes instantly with "No unresolved conversations" | -| PR branch is stale | Pipeline runs anyway; code-review may flag staleness | -| Rate limiting (429) | Wait and retry once per affected API call | -| Worktree creation fails | Record ERROR for that item, continue others | -| `--author` matches no PRs | "No open PRs match --author X" and exit cleanly | -| `--label` matches no PRs | "No open PRs match --label X" and exit cleanly | -| Combined filters match nothing | "No open PRs match the given filters" and exit cleanly | - -## Error Handling - -Graceful degradation throughout -- never abort the entire sweep for a single item failure. - -- **Per-item isolation:** Each pipeline runs in its own worktree. A failure in one does not affect others. -- **Stage completion:** All stages in a pipeline run to completion, even if earlier stages fail. This ensures the triage report has maximum information. -- **Agent recovery:** If an agent crashes, the error is captured and included in the triage report. The user can retry individual items. - -## Important Guidelines - -- **User owns all actions:** Sweep discovers and reports. Triage actions require confirmation. -- **No auto-merge:** Sweep never merges PRs. -- **Standard code-review mode:** Sweep uses standard GitHub-mode code-review, NOT `--preflight` or `--sweep`. Those are for pre-push author flow. -- **Resolve-threads runs first:** This ensures code fixes are pushed and threads are cleared before code-review and QA run against the updated branch. -- **Worktree cleanup:** Worktrees created by the Agent tool are automatically cleaned up if no changes are made. Fix-now worktrees persist for the user to review. -- **Concurrency awareness:** Respect the cap to avoid overwhelming CI, API rate limits, and local resources. +## Edge Cases & Guidelines + +| Scenario | Action | +| ----------------------- | ------------------------------------------------ | +| No open PRs | Exit: "Nothing to sweep" | +| All PRs pass | Skip triage walk-through, "All clear" | +| Agent crash | Record ERROR, continue others, include in triage | +| `gh` not authenticated | Abort: "Run `gh auth login` first" | +| Cap exceeded | Batch into waves | +| 1-2 PRs | Skip tournament, execute Standard | +| Rate limiting (429) | Wait and retry once per affected call | +| Worktree creation fails | Record ERROR, continue others | +| User cancels mid-triage | Partial results, still emit SWEEP RESULT | + +- **User owns all actions:** Triage actions require confirmation. No auto-merge. +- **Dry-run scout agents read only:** `gh pr view --json` and `gh pr checks` only -- no mutations. +- **Stage ordering (B/C):** resolve -> review -> qa. All stages run to completion. +- **Standard code-review mode:** GitHub PR mode, NOT `--preflight` or `--sweep`. +- **Worktree cleanup:** Auto-cleaned if no changes. Fix-now worktrees persist for user review. +- **Concurrency awareness:** Respect the cap to avoid overwhelming CI and API rate limits. From 1f4e6084730fe72e8e1a64aeed333d06610d7bfa Mon Sep 17 00:00:00 2001 From: Patrick Taylor <1963845+pstaylor-patrick@users.noreply.github.com> Date: Sun, 21 Jun 2026 16:07:26 -0500 Subject: [PATCH 2/2] fix(pst:sweep): resolve 8 adversarial findings in v2 tournament scouting - Opus judge is foreground (background+await was contradictory) - Scouts use raw gh api only; no sub-skill calls (fixes circular Strategy C, resolve-threads branch requirement, mutation risk) - Strategy C uses existing reviewDecision from Phase 1 JSON - Result block extended with PRs_TOTAL/IMPROVED, SIGNAL/NOISE counts so judge has real data for all three scoring axes - Phase 1 PR list injected into each scout prompt (no re-discovery) - Phase 2 Strategy C uses gh pr view reviewDecision after code-review - Non-Node repo guard added to Phase 2 bootstrap - Scout agents restricted to read-only Bash + gh api Co-Authored-By: Claude Sonnet 4.6 --- skills/pst:sweep/SKILL.md | 64 +++++++++++++++++++++++++-------------- 1 file changed, 42 insertions(+), 22 deletions(-) diff --git a/skills/pst:sweep/SKILL.md b/skills/pst:sweep/SKILL.md index 3869775..105879f 100644 --- a/skills/pst:sweep/SKILL.md +++ b/skills/pst:sweep/SKILL.md @@ -101,32 +101,43 @@ Spawn **3 foreground Sonnet agents** in a single response turn (NO `run_in_backg All three must complete before the judge. Each reads PR state via `gh pr view --json` and `gh pr checks` only -- NO mutations, NO comment posts, NO pushes. +**Inject the Phase 1 PR list** (`{number, title, isDraft, unresolvedThreadCount, +reviewDecision, checkStatus}`) into each scout prompt as the single source of truth. +Scouts must not run `gh pr list`. Each scout prompt must open with: "You have access to +Bash (read-only: `gh api`, `gh pr view`, `git log`, `git diff`) and Read only. Do not +use Edit, Write, or invoke any Skill tool." + ### Strategy A -- Conservative -Simulate: inspect all PRs for resolvable threads via read-only `pst:resolve-threads`. -Skip code-review and qa entirely. Report which threads are fixable, which PRs have merge -conflicts, and which would benefit most from thread cleanup. +Query `gh pr view $PR_NUMBER --json reviewThreads` per PR to count unresolved bot threads +that WOULD be resolved. Skip code-review and qa. Report fixable threads, merge conflicts, +and top cleanup candidates. No sub-skill calls. ### Strategy B -- Standard -Simulate per PR: resolve -> code-review -> qa sequentially. Skip drafts and conflict-blocked -PRs. Report projected findings for each stage across all PRs. +Read `gh pr checks $PR_NUMBER` and the injected `unresolvedThreadCount` / `reviewDecision` +per PR. Estimate projected findings per stage across all PRs. Skip drafts and +conflict-blocked PRs. No sub-skill calls. ### Strategy C -- Review-gated -Simulate: run code-review inspection first per PR. Only plan qa for PRs code-review would -mark APPROVED. Skip qa entirely for CHANGES_REQUESTED PRs to save compute. Report which PRs -gate out and why. +Use `reviewDecision` from the injected PR list. Gate qa estimates on +`reviewDecision == "APPROVED"`; skip qa for `CHANGES_REQUESTED` PRs. Report which PRs +gate out and why. No sub-skill calls. ### Required result block (each agent) ``` ---sweep-result--- STRATEGY: +PRs_TOTAL: PRs_PROCESSABLE: PRs_BLOCKED: +PRs_ESTIMATED_IMPROVED: +ESTIMATED_SIGNAL_FINDINGS: +ESTIMATED_NOISE_FINDINGS: ESTIMATED_MUTATIONS: -FINDINGS_SUMMARY: +FINDINGS_SUMMARY: ---end-sweep-result--- ``` @@ -134,15 +145,16 @@ FINDINGS_SUMMARY: ## Phase T.2: Opus Judge -Parse all three `---sweep-result---` blocks. Spawn one **background Opus agent** -(`model: opus`) and **await its result before Phase T.3**. Prompt: +Parse all three `---sweep-result---` blocks. Spawn one **foreground Opus agent** +(`model: opus`; blocks until done). Prompt: > Score each strategy (1-5 per axis): > -> - **Coverage**: how many PRs does this strategy actually improve vs skip? -> - **Precision**: ratio of signal findings to noise; review-gated strategies score higher -> when they block qa for CHANGES_REQUESTED PRs. -> - **Efficiency**: total mutations per PR improved (fewer is better). +> - **Coverage**: `PRs_ESTIMATED_IMPROVED / PRs_TOTAL` -- how many PRs move forward? +> - **Precision**: `ESTIMATED_SIGNAL_FINDINGS / (SIGNAL + NOISE)` -- real issues vs noise; +> review-gated strategies score higher for blocking qa on CHANGES_REQUESTED PRs. +> - **Efficiency**: `ESTIMATED_MUTATIONS / PRs_ESTIMATED_IMPROVED` -- mutations per PR +> improved (fewer is better). > > Return JSON only: > @@ -174,21 +186,29 @@ wave 2 only after wave 1 completes. - **Strategy A:** one background agent per PR running `pst:resolve-threads` only. - **Strategy B:** one background agent per PR running resolve -> code-review -> qa (skip drafts and conflict-blocked PRs). -- **Strategy C:** one background agent per PR running code-review first; include qa only - if code-review result is APPROVED. +- **Strategy C:** one background agent per PR running code-review first. After + code-review, run `gh pr view $PR_NUMBER --json reviewDecision`; skip qa if + `reviewDecision != "APPROVED"`. Each agent uses `isolation: worktree`. Bootstrap before running stages: ```bash gh pr checkout $PR_NUMBER -if [ -f pnpm-lock.yaml ]; then PKG="pnpm" +if [ ! -f package.json ]; then + echo "No package.json -- skipping JS bootstrap and JS quality gates." + PKG="" +elif [ -f pnpm-lock.yaml ]; then PKG="pnpm" elif [ -f yarn.lock ]; then PKG="yarn" else PKG="npm"; fi -if grep -q '"worktree:init"' package.json 2>/dev/null -then $PKG run worktree:init -else $PKG install --frozen-lockfile; fi +if [ -n "$PKG" ]; then + if grep -q '"worktree:init"' package.json 2>/dev/null + then $PKG run worktree:init + else $PKG install --frozen-lockfile; fi +fi ``` +Gate all `$PKG run ...` commands on `[ -n "$PKG" ]`. + Required result block per PR agent: ``` @@ -272,7 +292,7 @@ results: | User cancels mid-triage | Partial results, still emit SWEEP RESULT | - **User owns all actions:** Triage actions require confirmation. No auto-merge. -- **Dry-run scout agents read only:** `gh pr view --json` and `gh pr checks` only -- no mutations. +- **Dry-run scouts read only:** `gh pr view --json` and `gh pr checks` -- no mutations. - **Stage ordering (B/C):** resolve -> review -> qa. All stages run to completion. - **Standard code-review mode:** GitHub PR mode, NOT `--preflight` or `--sweep`. - **Worktree cleanup:** Auto-cleaned if no changes. Fix-now worktrees persist for user review.