From 6e8cd06e2ee94edbfdea6b4e84d8c6112f03bbd8 Mon Sep 17 00:00:00 2001
From: Patrick Taylor <1963845+pstaylor-patrick@users.noreply.github.com>
Date: Sun, 21 Jun 2026 15:58:35 -0500
Subject: [PATCH 1/2] feat(pst:sweep): v2 tournament scouting -- 3 dry-run
 pipeline strategies, Opus picks winner

Before executing the sweep, 3 parallel Sonnet agents scout the PR batch
in dry-run mode using divergent pipeline ordering strategies (Conservative,
Standard, Review-gated). Background Opus judge scores on coverage, precision,
and efficiency. Winning strategy is then applied for real.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 skills/pst:sweep/SKILL.md | 536 ++++++++++++--------------------------
 1 file changed, 167 insertions(+), 369 deletions(-)
diff --git a/skills/pst:sweep/SKILL.md b/skills/pst:sweep/SKILL.md
index df79bec..3869775 100644
--- a/skills/pst:sweep/SKILL.md
+++ b/skills/pst:sweep/SKILL.md
@@ -1,13 +1,14 @@
 ---
 name: pst:sweep
-description: Parallel quality sweep across open PRs -- resolve threads, code review, and QA in isolated worktrees, with optional author/label filtering
-argument-hint: "[--author <login>] [--label <name>] [--cap N] [--dry-run] [--stages resolve,review,qa]"
+description: Parallel quality sweep across open PRs -- tournament scouting picks the best pipeline strategy before executing for real
+argument-hint: "[--author <login>] [--label <name>] [--cap N] [--dry-run] [--stages resolve,review,qa] [--no-drafts]"
 allowed-tools: Bash, Read, Edit, Write, Grep, Glob, Agent, AskUserQuestion, Skill
 ---
 
-# Sweep -- Parallel Quality Pipelines Across Open PRs
+# Sweep -- Tournament-Scouted Quality Pipelines Across Open PRs
 
-Orchestrator that discovers open PRs, then fans out parallel quality pipelines (resolve-threads, code-review, QA) across all of them in isolated worktrees. Each PR gets its own agent. Filter by author, label, or any combination.
+Discovers open PRs, runs a best-of-3 dry-run tournament to select the optimal pipeline
+strategy, then executes the winner for real across all PRs in isolated worktrees.
 
 ---
 
@@ -15,393 +16,228 @@ Orchestrator that discovers open PRs, then fans out parallel quality pipelines (
 
 <arguments> #$ARGUMENTS </arguments>
 
-**Parse arguments:**
-
-- `--author <login>` -- only sweep PRs authored by this GitHub user (case-insensitive match)
+- `--author <login>` -- only sweep PRs by this GitHub user (case-insensitive)
 - `--label <name>` -- only sweep PRs with this label
-- `--cap N` -- override concurrency cap (max parallel pipelines, default 3)
+- `--cap N` -- concurrency cap (default 3, max 10)
 - `--dry-run` -- discover and classify PRs but do not run pipelines
-- `--stages <list>` -- comma-separated pipeline stages to run (default: `resolve,review,qa`). Valid stages: `resolve`, `review`, `qa`
-- `--no-drafts` -- exclude draft PRs (default: drafts ARE included)
-- No arguments -- sweep all open PRs with default pipeline
-
-Multiple filters combine as AND (e.g., `--author pat --label bug` = PRs by pat with label bug).
+- `--stages <list>` -- comma-separated subset of `resolve`, `review`, `qa` (default: all three)
+- `--no-drafts` -- exclude draft PRs (default: drafts included)
 
-Examples:
-
-- `/pst:sweep` -- all open PRs, full pipeline
-- `/pst:sweep --author pstaylor-patrick` -- only my PRs
-- `/pst:sweep --label "needs review"` -- only PRs with that label
-- `/pst:sweep --stages review,qa` -- skip resolve-threads
-- `/pst:sweep --cap 2 --dry-run` -- preview what would be swept
-- `/pst:sweep --no-drafts` -- skip draft PRs
+Multiple filters combine as AND. No arguments sweeps all open PRs.
 
 ---
 
 ## Phase 0: Guards & Config
 
 ```bash
-BRANCH=$(git branch --show-current 2>/dev/null)
-DEFAULT_BRANCH=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@')
-DEFAULT_BRANCH="${DEFAULT_BRANCH:-main}"
 OWNER_REPO=$(gh repo view --json nameWithOwner --jq .nameWithOwner)
 OWNER=$(echo "$OWNER_REPO" | cut -d/ -f1)
 REPO=$(echo "$OWNER_REPO" | cut -d/ -f2)
+DEFAULT_BRANCH=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@')
+DEFAULT_BRANCH="${DEFAULT_BRANCH:-main}"
 ```
 
-| Condition              | Action                               |
-| ---------------------- | ------------------------------------ |
-| `gh` not available     | Stop: "GitHub CLI (gh) is required." |
-| `gh auth status` fails | Stop: "Run `gh auth login` first."   |
-
-**Parse config:**
-
-```
-CONCURRENCY_CAP = args --cap N || 3
-STAGES = args --stages || "resolve,review,qa"
-AUTHOR_FILTER = args --author || ""
-LABEL_FILTER = args --label || ""
-INCLUDE_DRAFTS = !(args --no-drafts)
-DRY_RUN = args --dry-run
-```
-
-Validate:
+Stop if `gh` is unavailable or `gh auth status` fails.
 
-- `CONCURRENCY_CAP` must be a positive integer, max 10
-- `STAGES` must be a comma-separated subset of: `resolve`, `review`, `qa`
+Parse: `CONCURRENCY_CAP` (default 3), `STAGES` (default `resolve,review,qa`), `AUTHOR_FILTER`,
+`LABEL_FILTER`, `INCLUDE_DRAFTS`, `DRY_RUN`. Validate cap is 1-10 and stages are a valid subset.
 
 ---
 
 ## Phase 1: PR Discovery
 
-**1.1 Fetch open PRs**
+**Fetch:**
 
 ```bash
-gh pr list --state open --json number,title,headRefName,baseRefName,url,isDraft,author,labels,reviewDecision --limit 100
+gh pr list --state open \
+  --json number,title,headRefName,baseRefName,url,isDraft,author,labels,reviewDecision \
+  --limit 100
 ```
 
-Each PR has: `{number, title, headRefName, baseRefName, url, isDraft, author.login, labels[].name, reviewDecision}`
-
-**1.2 Apply filters**
-
-Starting from the full list, apply filters sequentially:
+**Filter:** Apply `--no-drafts`, `--author`, `--label` as AND. If nothing remains: print
+"Nothing to sweep -- no open PRs match the filters." and exit cleanly.
 
-1. **Draft filter:** If `--no-drafts`, remove PRs where `isDraft=true`. Otherwise keep all.
-2. **Author filter:** If `--author` provided, keep only PRs where `author.login` matches (case-insensitive).
-3. **Label filter:** If `--label` provided, keep only PRs where any label name matches (case-insensitive).
+**Classify:** For each PR, if `resolve` is in STAGES query unresolved thread count via GraphQL
+`reviewThreads`. Store `unresolvedThreads` and `reviewDecision`.
 
-**1.3 Classify PRs**
-
-For each remaining PR, classify:
-
-- **Has unresolved threads:** Query each PR for unresolved review threads (only if `resolve` is in STAGES):
-
-  ```bash
-  gh api graphql -f query='
-  {
-    repository(owner: "'$OWNER'", name: "'$REPO'") {
-      pullRequest(number: '$PR_NUMBER') {
-        reviewThreads(first: 1) {
-          nodes { isResolved }
-          totalCount
-        }
-      }
-    }
-  }'
-  ```
-
-  Count unresolved threads. Store as `unresolvedThreads` on the PR.
-
-- **Review status:** From `reviewDecision` -- `APPROVED`, `CHANGES_REQUESTED`, `REVIEW_REQUIRED`, or empty.
-
-**1.4 Present discovery**
-
-Use **AskUserQuestion**:
+**Confirm (AskUserQuestion):**
 
 ```
 SWEEP DISCOVERY
 ---------------
+  #   PR    Author     Title                  Draft  Threads  Review
+  1  #42  pstaylor  Add login flow              no      3    CHANGES_REQUESTED
+  2  #45  pstaylor  Fix auth redirect            no      0    REVIEW_REQUIRED
 
-  #   PR     Author          Title                               Draft  Threads  Review
-  1   #42    pstaylor        Add login flow                      no     3        CHANGES_REQUESTED
-  2   #45    pstaylor        Fix auth redirect                   no     0        REVIEW_REQUIRED
-  3   #48    contributor     Update docs                         yes    1        --
+Filters: {author/label/none}  Stages: {resolve -> review -> qa}
 
-Filters: {author: pstaylor | label: bug | none}
-Pipeline: {resolve -> review -> qa}
-Concurrency: {3}
-
-Options:
 1. Sweep all (Recommended)
-2. Select specific items (list numbers)
-3. Remove items (list numbers to exclude)
+2. Select specific items
+3. Remove items
 4. Abort
 ```
 
-If "Select specific" or "Remove": prompt for numbers, filter list, re-confirm.
-
-**If no items found after filtering:** Display "Nothing to sweep -- no open PRs match the filters." and exit cleanly.
-
-**If `--dry-run`:** Display the discovery table and exit. Do not proceed to Phase 2.
+If `--dry-run`: display table and exit. If no items after selection: exit cleanly.
 
 ---
 
-## Phase 2: Spawn Parallel Pipelines
-
-**2.1 Batch into waves**
-
-Split selected PRs into batches of `CONCURRENCY_CAP`. If 8 PRs and cap=3: wave 1 (items 1-3), wave 2 (items 4-6), wave 3 (items 7-8).
-
-**2.2 Spawn wave**
-
-For each PR in the current wave, launch a background agent with worktree isolation.
-
-**Important ordering:** Stages run sequentially within each pipeline in the order: `resolve` -> `review` -> `qa`. This order matters because:
-
-- `resolve` pushes code fixes and clears threads before review
-- `review` posts findings before QA validates behavior
-- `qa` runs last against the most up-to-date branch state
-
-**Agent prompt per PR:**
-
-````
-Agent:
-  description: "Sweep PR #N: {title}"
-  isolation: worktree
-  run_in_background: true
-  prompt: |
-    You are running a sweep quality pipeline for PR #$PR_NUMBER in $OWNER/$REPO.
-
-    The PR branch is: $HEAD_BRANCH (base: $BASE_BRANCH)
-    PR URL: $PR_URL
-
-    First, check out the PR branch:
-    ```bash
-    gh pr checkout $PR_NUMBER
-    ```
-
-    Then bootstrap the worktree so pre-commit/pre-push hooks and quality
-    gates can actually run (env symlinks, deps, husky hooks path):
-    ```bash
-    if [ -f pnpm-lock.yaml ]; then PKG="pnpm"
-    elif [ -f yarn.lock ]; then PKG="yarn"
-    else PKG="npm"; fi
-    if grep -q '"worktree:init"' package.json 2>/dev/null; then
-      $PKG run worktree:init
-    fi
-    ```
-    If the repo doesn't define `worktree:init`, fall back to
-    `$PKG install --frozen-lockfile`. Without this step, downstream
-    stages will misreport (missing env files masquerade as real
-    failures) and pre-push hooks will fail.
-
-    Run these stages SEQUENTIALLY. Run ALL stages to completion
-    -- do NOT short-circuit on failure. Record results for each stage.
-
-    [STAGE: resolve-threads -- INCLUDE IF "resolve" IN STAGES]
-    Stage 1: Resolve Threads
-    Run: Skill("pst:resolve-threads", "$PR_NUMBER")
-    Parse the --- RESOLVE RESULT --- block from the output.
-    Record: total conversations, processed, fixed, threads resolved.
-    If the skill pushed fixes, note the new HEAD commit.
-
-    [STAGE: code-review -- INCLUDE IF "review" IN STAGES]
-    Stage 2: Code Review
-    Run: Skill("pst:code-review", "$PR_NUMBER")
-    This runs in standard GitHub PR mode -- posts a review to the PR.
-    Record: verdict (APPROVED / CHANGES_REQUESTED / COMMENT), finding count, critical count.
-
-    [STAGE: qa -- INCLUDE IF "qa" IN STAGES]
-    Stage 3: QA
-    Run: Skill("pst:qa", "$PR_NUMBER")
-    This runs fully autonomously. Record the QA result.
-
-    After ALL stages complete, output this block EXACTLY:
-
-    --- SWEEP PIPELINE RESULT ---
-    pr: #$PR_NUMBER
-    title: $PR_TITLE
-    author: $PR_AUTHOR
-    type: pre-merge
-
-    resolve-threads:
-      status: [COMPLETED|SKIPPED|ERROR]
-      total-conversations: N
-      processed: N
-      fixed: N
-      threads-resolved: N
-      pushed: [yes|no]
-
-    code-review:
-      status: [COMPLETED|SKIPPED|ERROR]
-      verdict: [APPROVED|CHANGES_REQUESTED|COMMENT|--]
-      findings: N
-      critical: N
-
-    qa:
-      status: [COMPLETED|SKIPPED|ERROR]
-      overall: [PASSED|FAILED|PARTIAL|SKIPPED]
-      test-cases-total: N
-      test-cases-passed: N
-      test-cases-failed: N
-
-    pipeline-overall: [PASSED|FAILED|ERROR]
-    --- END SWEEP PIPELINE RESULT ---
-````
-
-Only include stages that are in `STAGES`. For omitted stages, output `status: SKIPPED` in the result block.
-
-**2.3 Monitor wave progress**
-
-As each background agent completes (orchestrator is notified automatically), parse the `SWEEP PIPELINE RESULT` block from its result. Display incremental progress:
+## Tournament Gate
 
-```
-SWEEP PROGRESS
---------------
-
-Wave 1 of 2:
-
-  #   PR     Author     Resolve    Review        QA         Overall
-  1   #42    pstaylor   3 fixed    APPROVED      3/3 PASS   PASSED
-  2   #45    pstaylor   0 threads  CHANGES (1c)  RUNNING    --
-  3   #48    contrib    SKIPPED    COMMENT       2/4 FAIL   FAILED
-```
+**Skip condition (1-2 PRs):** Skip tournament, set `WINNING_STRATEGY=B`, jump to Phase 2.
 
-After all agents in a wave complete, start the next wave.
+Otherwise, ask (AskUserQuestion):
 
-**2.4 Handle agent failures**
+> Scout with N=3 strategies before executing? (Yes / Execute Standard / Skip sweep)
 
-If an agent crashes or returns without a valid `SWEEP PIPELINE RESULT` block:
-
-- Record the item as `ERROR` with the agent's error output
-- Continue with remaining items
-- Include in triage report
+- **Yes:** run Phase T.
+- **Execute Standard:** set `WINNING_STRATEGY=B`, skip Phase T.
+- **Skip sweep:** exit cleanly.
 
 ---
 
-## Phase 3: Triage Report
+## Phase T: Dry-Run Scout Tournament
 
-**3.1 Generate report**
+Spawn **3 foreground Sonnet agents** in a single response turn (NO `run_in_background`).
+All three must complete before the judge. Each reads PR state via `gh pr view --json` and
+`gh pr checks` only -- NO mutations, NO comment posts, NO pushes.
 
-Create a markdown report at `/tmp/pst-sweep-$(date +%Y%m%d-%H%M%S).md`:
+### Strategy A -- Conservative
 
-```markdown
-# Sweep Report -- [date]
+Simulate: inspect all PRs for resolvable threads via read-only `pst:resolve-threads`.
+Skip code-review and qa entirely. Report which threads are fixable, which PRs have merge
+conflicts, and which would benefit most from thread cleanup.
 
-## Summary
+### Strategy B -- Standard
 
-| Metric    | Count |
-| --------- | ----- |
-| Total PRs | N     |
-| Passed    | N     |
-| Failed    | N     |
-| Errors    | N     |
+Simulate per PR: resolve -> code-review -> qa sequentially. Skip drafts and conflict-blocked
+PRs. Report projected findings for each stage across all PRs.
 
-Filters: {author: X | label: Y | none}
-Stages: {resolve, review, qa}
+### Strategy C -- Review-gated
 
-## Results
+Simulate: run code-review inspection first per PR. Only plan qa for PRs code-review would
+mark APPROVED. Skip qa entirely for CHANGES_REQUESTED PRs to save compute. Report which PRs
+gate out and why.
 
-### PR #42 -- Add login flow (pstaylor) -- PASSED
+### Required result block (each agent)
 
-**Resolve Threads:** 3 conversations processed, 3 fixed, pushed
-**Code Review:** APPROVED -- 0 findings
-**QA:** 3/3 test cases passed
+```
+---sweep-result---
+STRATEGY: <A|B|C>
+PRs_PROCESSABLE: <integer>
+PRs_BLOCKED: <integer>
+ESTIMATED_MUTATIONS: <integer: comment posts + pushes this strategy would make>
+FINDINGS_SUMMARY: <one-line summary of key findings across all PRs>
+---end-sweep-result---
+```
 
 ---
 
-### PR #48 -- Update docs (contributor) -- FAILED
-
-**Resolve Threads:** SKIPPED (--no-drafts excluded, or not in stages)
-**Code Review:** COMMENT -- 2 findings (0 critical)
-**QA:** 2/4 test cases passed -- 2 FAILED
-
-- TC-2: Broken link in sidebar -- FAIL
-- TC-4: Image alt text missing -- FAIL
+## Phase T.2: Opus Judge
+
+Parse all three `---sweep-result---` blocks. Spawn one **background Opus agent**
+(`model: opus`) and **await its result before Phase T.3**. Prompt:
+
+> Score each strategy (1-5 per axis):
+>
+> - **Coverage**: how many PRs does this strategy actually improve vs skip?
+> - **Precision**: ratio of signal findings to noise; review-gated strategies score higher
+>   when they block qa for CHANGES_REQUESTED PRs.
+> - **Efficiency**: total mutations per PR improved (fewer is better).
+>
+> Return JSON only:
+>
+> ```json
+> {
+>   "winner": "A|B|C",
+>   "scores": {
+>     "A": { "coverage": 0, "precision": 0, "efficiency": 0 },
+>     "B": { "coverage": 0, "precision": 0, "efficiency": 0 },
+>     "C": { "coverage": 0, "precision": 0, "efficiency": 0 }
+>   },
+>   "reasoning": "one sentence"
+> }
+> ```
+
+Set `WINNING_STRATEGY` from `winner`. Log scores and reasoning before proceeding.
 
 ---
 
-[...per-PR sections...]
-```
-
-**3.2 Interactive triage (only if failures exist)**
+## Phase 2: Execute Winning Strategy
 
-If ALL items passed -> skip triage, display "All clear" summary, jump to Phase 4.
+Announce the winning strategy and Opus reasoning. Run the real (non-dry-run) sweep.
 
-For each failed/errored PR, present options via **AskUserQuestion**:
+**2.1 Batch into waves:** Split selected PRs into groups of `CONCURRENCY_CAP`. Start
+wave 2 only after wave 1 completes.
 
-**For failed PRs:**
+**2.2 Pipeline per strategy:**
 
-```
-PR #48 -- Update docs (contributor) -- FAILED
+- **Strategy A:** one background agent per PR running `pst:resolve-threads` only.
+- **Strategy B:** one background agent per PR running resolve -> code-review -> qa (skip
+  drafts and conflict-blocked PRs).
+- **Strategy C:** one background agent per PR running code-review first; include qa only
+  if code-review result is APPROVED.
 
-Resolve: -- | Review: 2 findings | QA: 2/4 failed
+Each agent uses `isolation: worktree`. Bootstrap before running stages:
 
-Options:
-1. Fix now -- spawn an agent to address findings and push
-2. Post summary comment on PR
-3. View full details
-4. Skip -- acknowledge and move on
+```bash
+gh pr checkout $PR_NUMBER
+if [ -f pnpm-lock.yaml ]; then PKG="pnpm"
+elif [ -f yarn.lock ]; then PKG="yarn"
+else PKG="npm"; fi
+if grep -q '"worktree:init"' package.json 2>/dev/null
+then $PKG run worktree:init
+else $PKG install --frozen-lockfile; fi
 ```
 
-Option behaviors:
-
-- **Fix now** -- launch an Agent with `isolation: worktree`:
-
-  ````
-  Agent:
-    description: "Fix sweep findings for PR #N"
-    isolation: worktree
-    prompt: |
-      Address these findings from the sweep pipeline for PR #$PR_NUMBER:
+Required result block per PR agent:
 
-      Code Review Findings:
-      [list findings]
+```
+--- SWEEP PIPELINE RESULT ---
+pr: #$PR_NUMBER
+title: $PR_TITLE
+resolve-threads: status: [COMPLETED|SKIPPED|ERROR] | fixed: N | pushed: [yes|no]
+code-review: status: [COMPLETED|SKIPPED|ERROR] | verdict: [APPROVED|CHANGES_REQUESTED|COMMENT|--] | findings: N
+qa: status: [COMPLETED|SKIPPED|ERROR] | overall: [PASSED|FAILED|PARTIAL|SKIPPED]
+pipeline-overall: [PASSED|FAILED|ERROR]
+--- END SWEEP PIPELINE RESULT ---
+```
 
-      QA Failures:
-      [list QA failures]
+Run ALL stages to completion -- do NOT short-circuit on failure.
 
-      Check out the PR branch, fix each issue, commit changes, and push.
-      ```bash
-      gh pr checkout $PR_NUMBER
-      ```
-  ````
+**2.3 Progress display** as agents complete:
 
-- **Post comment** -- post a summary of all pipeline findings as a PR comment:
+```
+SWEEP PROGRESS (Strategy B -- Standard)
+  #  PR    Resolve       Review          QA          Overall
+  1  #42   3 fixed       APPROVED        3/3 PASS    PASSED
+  2  #45   RUNNING       --              --           --
+```
 
-  ```bash
-  gh pr comment $PR_NUMBER --body "$(cat <<'EOF'
-  ## Sweep Results
+**2.4 Agent failures:** Record as ERROR, continue remaining agents, include in triage.
 
-  **Pipeline: FAILED**
+---
 
-  [stage summaries]
-  EOF
-  )"
-  ```
+## Phase 3: Triage Report
 
-- **View details** -- display the full section from the sweep report
-- **Skip** -- record decision and move on
+Write `/tmp/pst-sweep-$(date +%Y%m%d-%H%M%S).md`. If all passed: "All clear", jump to
+Phase 4.
 
-**For errored items:**
+For each failed/errored PR (AskUserQuestion per item):
 
 ```
-PR #50 -- Feature X -- ERROR
-
-Agent crashed: [error excerpt]
+PR #48 -- FAILED | Review: 2 findings | QA: 2/4 failed
 
-Options:
-1. Retry pipeline
-2. Skip
+1. Fix now (spawn worktree agent to address findings and push)
+2. Post summary comment on PR
+3. View full details
+4. Skip
 ```
 
-If "Retry": re-launch the agent for that item.
-
-**3.3 Clean up report**
+- **Fix now:** Agent with `isolation: worktree`, `gh pr checkout`, address findings, push.
+- **Post comment:** `gh pr comment $PR_NUMBER --body "..."` with stage summaries.
+- **Retry** (error items): re-launch agent for that PR.
 
-```bash
-rm -f "$REPORT_PATH"
-```
+Clean up: `rm -f "$REPORT_PATH"`.
 
 ---
 
@@ -410,72 +246,34 @@ rm -f "$REPORT_PATH"
 ```
 --- SWEEP RESULT ---
 timestamp: [ISO 8601]
-filters: {author: X, label: Y | none}
-stages: {resolve, review, qa}
-items-total: N
-items-passed: N
-items-failed: N
-items-errored: N
+strategy: [A|B|C] ([Conservative|Standard|Review-gated])
+items-total: N | items-passed: N | items-failed: N | items-errored: N
 
 results:
-  - pr: #42
-    author: pstaylor
-    pipeline: PASSED
-    resolve: 3 fixed
-    review: APPROVED
-    qa: 3/3
-
-  - pr: #45
-    author: pstaylor
-    pipeline: PASSED
-    resolve: 0 threads
-    review: APPROVED
-    qa: 5/5
-
-  - pr: #48
-    author: contributor
-    pipeline: FAILED
-    resolve: SKIPPED
-    review: COMMENT (2 findings)
-    qa: 2/4
-    triage: fix-now
-
+  - pr: #42 | pipeline: PASSED | resolve: 3 fixed | review: APPROVED | qa: 3/3
+  - pr: #45 | pipeline: FAILED | resolve: 0 | review: COMMENT (2) | qa: 2/4
 --- END SWEEP RESULT ---
 ```
 
 ---
 
-## Edge Cases
-
-| Scenario                       | Action                                                                 |
-| ------------------------------ | ---------------------------------------------------------------------- |
-| No open PRs                    | "Nothing to sweep" message, exit cleanly                               |
-| All PRs pass                   | Skip triage walk-through, "All clear" summary                          |
-| Agent crash                    | Record ERROR, continue other agents, include in triage                 |
-| `gh` not authenticated         | Abort with: "Run `gh auth login` first"                                |
-| Concurrency cap exceeded       | Batch into waves (Phase 2.1)                                           |
-| User cancels mid-triage        | Partial results, still emit SWEEP RESULT                               |
-| PR has no review threads       | `resolve` stage completes instantly with "No unresolved conversations" |
-| PR branch is stale             | Pipeline runs anyway; code-review may flag staleness                   |
-| Rate limiting (429)            | Wait and retry once per affected API call                              |
-| Worktree creation fails        | Record ERROR for that item, continue others                            |
-| `--author` matches no PRs      | "No open PRs match --author X" and exit cleanly                        |
-| `--label` matches no PRs       | "No open PRs match --label X" and exit cleanly                         |
-| Combined filters match nothing | "No open PRs match the given filters" and exit cleanly                 |
-
-## Error Handling
-
-Graceful degradation throughout -- never abort the entire sweep for a single item failure.
-
-- **Per-item isolation:** Each pipeline runs in its own worktree. A failure in one does not affect others.
-- **Stage completion:** All stages in a pipeline run to completion, even if earlier stages fail. This ensures the triage report has maximum information.
-- **Agent recovery:** If an agent crashes, the error is captured and included in the triage report. The user can retry individual items.
-
-## Important Guidelines
-
-- **User owns all actions:** Sweep discovers and reports. Triage actions require confirmation.
-- **No auto-merge:** Sweep never merges PRs.
-- **Standard code-review mode:** Sweep uses standard GitHub-mode code-review, NOT `--preflight` or `--sweep`. Those are for pre-push author flow.
-- **Resolve-threads runs first:** This ensures code fixes are pushed and threads are cleared before code-review and QA run against the updated branch.
-- **Worktree cleanup:** Worktrees created by the Agent tool are automatically cleaned up if no changes are made. Fix-now worktrees persist for the user to review.
-- **Concurrency awareness:** Respect the cap to avoid overwhelming CI, API rate limits, and local resources.
+## Edge Cases & Guidelines
+
+| Scenario                | Action                                           |
+| ----------------------- | ------------------------------------------------ |
+| No open PRs             | Exit: "Nothing to sweep"                         |
+| All PRs pass            | Skip triage walk-through, "All clear"            |
+| Agent crash             | Record ERROR, continue others, include in triage |
+| `gh` not authenticated  | Abort: "Run `gh auth login` first"               |
+| Cap exceeded            | Batch into waves                                 |
+| 1-2 PRs                 | Skip tournament, execute Standard                |
+| Rate limiting (429)     | Wait and retry once per affected call            |
+| Worktree creation fails | Record ERROR, continue others                    |
+| User cancels mid-triage | Partial results, still emit SWEEP RESULT         |
+
+- **User owns all actions:** Triage actions require confirmation. No auto-merge.
+- **Dry-run scout agents read only:** `gh pr view --json` and `gh pr checks` only -- no mutations.
+- **Stage ordering (B/C):** resolve -> review -> qa. All stages run to completion.
+- **Standard code-review mode:** GitHub PR mode, NOT `--preflight` or `--sweep`.
+- **Worktree cleanup:** Auto-cleaned if no changes. Fix-now worktrees persist for user review.
+- **Concurrency awareness:** Respect the cap to avoid overwhelming CI and API rate limits.

From 1f4e6084730fe72e8e1a64aeed333d06610d7bfa Mon Sep 17 00:00:00 2001
From: Patrick Taylor <1963845+pstaylor-patrick@users.noreply.github.com>
Date: Sun, 21 Jun 2026 16:07:26 -0500
Subject: [PATCH 2/2] fix(pst:sweep): resolve 8 adversarial findings in v2
 tournament scouting

- Opus judge is foreground (background+await was contradictory)
- Scouts use raw gh api only; no sub-skill calls (fixes circular
  Strategy C, resolve-threads branch requirement, mutation risk)
- Strategy C uses existing reviewDecision from Phase 1 JSON
- Result block extended with PRs_TOTAL/IMPROVED, SIGNAL/NOISE counts
  so judge has real data for all three scoring axes
- Phase 1 PR list injected into each scout prompt (no re-discovery)
- Phase 2 Strategy C uses gh pr view reviewDecision after code-review
- Non-Node repo guard added to Phase 2 bootstrap
- Scout agents restricted to read-only Bash + gh api

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 skills/pst:sweep/SKILL.md | 64 +++++++++++++++++++++++++--------------
 1 file changed, 42 insertions(+), 22 deletions(-)

diff --git a/skills/pst:sweep/SKILL.md b/skills/pst:sweep/SKILL.md
index 3869775..105879f 100644
--- a/skills/pst:sweep/SKILL.md
+++ b/skills/pst:sweep/SKILL.md
@@ -101,32 +101,43 @@ Spawn **3 foreground Sonnet agents** in a single response turn (NO `run_in_backg
 All three must complete before the judge. Each reads PR state via `gh pr view --json` and
 `gh pr checks` only -- NO mutations, NO comment posts, NO pushes.
 
+**Inject the Phase 1 PR list** (`{number, title, isDraft, unresolvedThreadCount,
+reviewDecision, checkStatus}`) into each scout prompt as the single source of truth.
+Scouts must not run `gh pr list`. Each scout prompt must open with: "You have access to
+Bash (read-only: `gh api`, `gh pr view`, `git log`, `git diff`) and Read only. Do not
+use Edit, Write, or invoke any Skill tool."
+
 ### Strategy A -- Conservative
 
-Simulate: inspect all PRs for resolvable threads via read-only `pst:resolve-threads`.
-Skip code-review and qa entirely. Report which threads are fixable, which PRs have merge
-conflicts, and which would benefit most from thread cleanup.
+Query `gh pr view $PR_NUMBER --json reviewThreads` per PR to count unresolved bot threads
+that WOULD be resolved. Skip code-review and qa. Report fixable threads, merge conflicts,
+and top cleanup candidates. No sub-skill calls.
 
 ### Strategy B -- Standard
 
-Simulate per PR: resolve -> code-review -> qa sequentially. Skip drafts and conflict-blocked
-PRs. Report projected findings for each stage across all PRs.
+Read `gh pr checks $PR_NUMBER` and the injected `unresolvedThreadCount` / `reviewDecision`
+per PR. Estimate projected findings per stage across all PRs. Skip drafts and
+conflict-blocked PRs. No sub-skill calls.
 
 ### Strategy C -- Review-gated
 
-Simulate: run code-review inspection first per PR. Only plan qa for PRs code-review would
-mark APPROVED. Skip qa entirely for CHANGES_REQUESTED PRs to save compute. Report which PRs
-gate out and why.
+Use `reviewDecision` from the injected PR list. Gate qa estimates on
+`reviewDecision == "APPROVED"`; skip qa for `CHANGES_REQUESTED` PRs. Report which PRs
+gate out and why. No sub-skill calls.
 
 ### Required result block (each agent)
 
 ```
 ---sweep-result---
 STRATEGY: <A|B|C>
+PRs_TOTAL: <integer: total PRs in batch>
 PRs_PROCESSABLE: <integer>
 PRs_BLOCKED: <integer>
+PRs_ESTIMATED_IMPROVED: <integer: PRs this strategy would move forward>
+ESTIMATED_SIGNAL_FINDINGS: <integer: threads/checks that are real issues>
+ESTIMATED_NOISE_FINDINGS: <integer: bot noise, auto-closeable items>
 ESTIMATED_MUTATIONS: <integer: comment posts + pushes this strategy would make>
-FINDINGS_SUMMARY: <one-line summary of key findings across all PRs>
+FINDINGS_SUMMARY: <one-line summary>
 ---end-sweep-result---
 ```
 
@@ -134,15 +145,16 @@ FINDINGS_SUMMARY: <one-line summary of key findings across all PRs>
 
 ## Phase T.2: Opus Judge
 
-Parse all three `---sweep-result---` blocks. Spawn one **background Opus agent**
-(`model: opus`) and **await its result before Phase T.3**. Prompt:
+Parse all three `---sweep-result---` blocks. Spawn one **foreground Opus agent**
+(`model: opus`; blocks until done). Prompt:
 
 > Score each strategy (1-5 per axis):
 >
-> - **Coverage**: how many PRs does this strategy actually improve vs skip?
-> - **Precision**: ratio of signal findings to noise; review-gated strategies score higher
->   when they block qa for CHANGES_REQUESTED PRs.
-> - **Efficiency**: total mutations per PR improved (fewer is better).
+> - **Coverage**: `PRs_ESTIMATED_IMPROVED / PRs_TOTAL` -- how many PRs move forward?
+> - **Precision**: `ESTIMATED_SIGNAL_FINDINGS / (SIGNAL + NOISE)` -- real issues vs noise;
+>   review-gated strategies score higher for blocking qa on CHANGES_REQUESTED PRs.
+> - **Efficiency**: `ESTIMATED_MUTATIONS / PRs_ESTIMATED_IMPROVED` -- mutations per PR
+>   improved (fewer is better).
 >
 > Return JSON only:
 >
@@ -174,21 +186,29 @@ wave 2 only after wave 1 completes.
 - **Strategy A:** one background agent per PR running `pst:resolve-threads` only.
 - **Strategy B:** one background agent per PR running resolve -> code-review -> qa (skip
   drafts and conflict-blocked PRs).
-- **Strategy C:** one background agent per PR running code-review first; include qa only
-  if code-review result is APPROVED.
+- **Strategy C:** one background agent per PR running code-review first. After
+  code-review, run `gh pr view $PR_NUMBER --json reviewDecision`; skip qa if
+  `reviewDecision != "APPROVED"`.
 
 Each agent uses `isolation: worktree`. Bootstrap before running stages:
 
 ```bash
 gh pr checkout $PR_NUMBER
-if [ -f pnpm-lock.yaml ]; then PKG="pnpm"
+if [ ! -f package.json ]; then
+  echo "No package.json -- skipping JS bootstrap and JS quality gates."
+  PKG=""
+elif [ -f pnpm-lock.yaml ]; then PKG="pnpm"
 elif [ -f yarn.lock ]; then PKG="yarn"
 else PKG="npm"; fi
-if grep -q '"worktree:init"' package.json 2>/dev/null
-then $PKG run worktree:init
-else $PKG install --frozen-lockfile; fi
+if [ -n "$PKG" ]; then
+  if grep -q '"worktree:init"' package.json 2>/dev/null
+  then $PKG run worktree:init
+  else $PKG install --frozen-lockfile; fi
+fi
 ```
 
+Gate all `$PKG run ...` commands on `[ -n "$PKG" ]`.
+
 Required result block per PR agent:
 
 ```
@@ -272,7 +292,7 @@ results:
 | User cancels mid-triage | Partial results, still emit SWEEP RESULT         |
 
 - **User owns all actions:** Triage actions require confirmation. No auto-merge.
-- **Dry-run scout agents read only:** `gh pr view --json` and `gh pr checks` only -- no mutations.
+- **Dry-run scouts read only:** `gh pr view --json` and `gh pr checks` -- no mutations.
 - **Stage ordering (B/C):** resolve -> review -> qa. All stages run to completion.
 - **Standard code-review mode:** GitHub PR mode, NOT `--preflight` or `--sweep`.
 - **Worktree cleanup:** Auto-cleaned if no changes. Fix-now worktrees persist for user review.