From 081257130acdd0dfbaaabf89c57c9505d2a228f9 Mon Sep 17 00:00:00 2001 From: Patrick Taylor <1963845+pstaylor-patrick@users.noreply.github.com> Date: Sun, 21 Jun 2026 06:10:54 -0500 Subject: [PATCH 1/3] feat(pst): best-of-N parallel implementation tournament for refactors and extensions Rule 24: spawn 2-5 parallel Sonnet agents (Conservative/Structural/Extract-first strategies) in isolated worktrees, Opus judge picks winner by MAINTAINABILITY.md criteria. /pst:refactor implements the pattern with N scaled to complexity. Trivial: N=1 (no tournament). Moderate: N=3. Complex: N=5. Co-Authored-By: Claude Sonnet 4.6 Claude-Session: https://claude.ai/code/session_019BCr5Qxi8jXnG2Rr7zdkw1 --- skills/pst/SKILL.md | 2 + skills/pst:refactor/SKILL.md | 106 +++++++++++++++++++++++++---------- 2 files changed, 78 insertions(+), 30 deletions(-) diff --git a/skills/pst/SKILL.md b/skills/pst/SKILL.md index aeebb03..171296e 100644 --- a/skills/pst/SKILL.md +++ b/skills/pst/SKILL.md @@ -135,6 +135,8 @@ rules a hook reminds about (non-blocking). Detail and examples are in 23. **Maintainability review on every code change** `[NUDGE]`. After any change that touches at least one code file (non-docs, non-lockfile), run a Fowler-smell pass using `MAINTAINABILITY.md` as the rubric. This is a separate refactoring commit (two hats per rule 15): behavior stays identical, only structure improves. The pass is Haiku-tier since it is lightweight. The only exemption is a changeset that touches zero code files (docs-only, lockfile-only, or pure config values). A small diff does not exempt a change -- smells manifest at any size. Run with `/pst:adversarial-review` or inline. See `MAINTAINABILITY.md` for the 16 canonical smells. +24. **Best-of-N implementation tournament** `[NUDGE]`. For substantive refactors or extensions where multiple valid implementations exist, spawn 2-5 parallel Sonnet agents in isolated worktrees, each with a divergent strategy: Conservative (smallest diff, no new abstractions), Structural (reorganize by responsibility), Extract-first (name every abstraction). For N=5, add Domain-model and Functional strategies. After all agents complete, an Opus judge selects the winner using MAINTAINABILITY.md criteria: cohesion, coupling, intent clarity, and locality of change. Skip for trivial one-way fixes where a single correct path is obvious. Rule 15 two-hats still applies to every implementation agent. `/pst:refactor` implements this pattern. + ## Usage `/pst` activates, `/pst off` disarms. Mechanics, merge modes, and rule detail are diff --git a/skills/pst:refactor/SKILL.md b/skills/pst:refactor/SKILL.md index 8200fc0..69e63e9 100644 --- a/skills/pst:refactor/SKILL.md +++ b/skills/pst:refactor/SKILL.md @@ -85,41 +85,87 @@ If `--report-only` was passed, stop here. Use `AskUserQuestion`: **Apply all fixes in isolated worktrees?** (Yes / Report only). Treat no objection as Yes. -## 4. Parallel Sonnet implementers (isolated worktrees) - -Group findings by file. Spawn one background Sonnet agent per file -(`model: sonnet`, `isolation: worktree`) with: - -- The file content. -- The smell findings for that file. -- Instruction: +## 4. Plan gate (foreground) + +Determine N from the smell count and file count: + +- **Trivial** (1-3 smells, 1-2 files): N=1 -- single Sonnet, skip tournament. +- **Moderate** (4-10 smells or 3-5 files): N=3. +- **Complex** (11+ smells or 6+ files): N=5. + +Present smell count and N to the user via `AskUserQuestion`: "Run tournament +with N=\ implementations?" (Yes / Report only / Adjust N). If +`--report-only` was passed, stop here without asking. + +## 5. Parallel implementation tournament (background) + +Spawn N background Sonnet agents (`model: sonnet`, `isolation: worktree`), +each receiving the same smell findings and target files but a different +strategy directive: + +- **Strategy A -- Conservative**: Fix only the highest-impact smells with the + smallest possible diff. Prefer inlining over new abstractions when the call + site is nearby. Preserve the existing module and file structure entirely. +- **Strategy B -- Structural**: Reorganize by responsibility. Group related + behavior together even if it means creating new files or moving methods + between classes. Optimize for locality of future change. +- **Strategy C -- Extract-first**: Create a named function, class, or module + for every smell instance. Err toward more abstractions with clear names. + Every duplicated concept gets a home. + +If N=5, add: + +- **Strategy D -- Domain-model**: Look for primitive obsession and data clumps; + introduce domain objects or value types to make implicit business concepts + explicit. +- **Strategy E -- Functional**: Prefer pure functions and immutable data where + the language supports it. Move state to the edges. Reduce side-effect surface. + +Each agent must: + +1. Apply its strategy to fix the smells in the target files. +2. Run existing tests if present; skip any fix that breaks a test and note it. +3. Stage and commit: `git add && git commit -m "refactor(): "`. +4. Include co-author trailer: `Co-Authored-By: Claude Sonnet 4.6 `. + +## 6. Opus judge selects winner (background) + +After all N agents complete, spawn one background Opus agent (`model: opus`) +with: + +- All N diffs (collected via `git show HEAD` in each worktree). +- The smell findings from step 2. +- The content of `MAINTAINABILITY.md`. +- Instruction: evaluate each implementation against the six MAINTAINABILITY.md + outcomes (Higher Cohesion, Lower Coupling, Explicit Intent, Locality of + Change, Reduced Cognitive Load, Strong Domain Modeling). Score each 1-5 on + each dimension. Pick the winner. + +Return schema: + +```json +{ + "winner": "Conservative|Structural|Extract-first|Domain-model|Functional", + "scores": { "A": 0, "B": 0, "C": 0 }, + "reasoning": "one sentence" +} +``` - ``` - Apply each refactoring listed below to this file. Rules: - - Behavior-preserving only. Do not change any observable behavior. - - Run existing tests if a test runner is available; skip any fix that - breaks a test and note it. - - After all fixes: stage the changed file and commit: - git add - git commit -m "refactor(): " - Include the co-author trailer: - Co-Authored-By: Claude Sonnet 4.6 - - If no fix is safe to apply, commit nothing and return a skip reason. - ``` +## 7. Apply and report -## 5. Report +Cherry-pick the winning commit to the current branch: -After all worktree agents complete: +```sh +git cherry-pick +``` -- List each file: smells fixed, smells skipped (with reason), test result. -- If any fix broke a test, name the fix and the failing test. -- State total smells fixed vs. skipped. +Report: winning strategy, Opus reasoning, scores for each strategy, smells +fixed vs. skipped, and test results. ## Notes -- Rule 15 two-hats: every commit from this skill is a pure refactor. If a - fix requires a behavior change, skip it and surface it as a follow-up. -- Rule 7 applies if you intend to merge the result via PR: adversarial review - before merge. -- For very large codebases (> 50 files), run with a scoped path first: +- Rule 24 best-of-N: N=1 for trivial, N=3 for moderate, N=5 for complex. +- Rule 15 two-hats applies to every implementation agent: no behavior changes. +- Rule 7 applies if the result goes to a PR: adversarial review before merge. +- For very large codebases (50+ files), scope with a path first: `/pst:refactor src/auth` then widen. From 24405e07f0814429a8b62695715d90007ea63cdd Mon Sep 17 00:00:00 2001 From: Patrick Taylor <1963845+pstaylor-patrick@users.noreply.github.com> Date: Sun, 21 Jun 2026 06:19:07 -0500 Subject: [PATCH 2/3] fix(pst:refactor): resolve 5 adversarial review findings in best-of-N tournament - Step 5: switch from background to foreground parallel agents so synchronization is implicit (no explicit gate needed before judge) - Each agent returns a structured ---tournament-result--- block with STRATEGY, STATUS, COMMIT_SHA, and full git diff -- no worktree access required after agent completes - Step 6: judge reads diffs from result blocks directly; returns winning letter (A-E) not strategy name; SHA lookup stays in orchestrator - Step 7: cherry-pick by SHA from result blocks, not from judge output; fall back to git apply on conflict - Failed agents emit STATUS: skipped: so errors surface - Rule 24: trimmed to normative pointer only; impl details live in skill Co-Authored-By: Claude Sonnet 4.6 Claude-Session: https://claude.ai/code/session_019BCr5Qxi8jXnG2Rr7zdkw1 --- skills/pst/SKILL.md | 2 +- skills/pst:refactor/SKILL.md | 108 ++++++++++++++++++++++++----------- 2 files changed, 76 insertions(+), 34 deletions(-) diff --git a/skills/pst/SKILL.md b/skills/pst/SKILL.md index 171296e..f2a4376 100644 --- a/skills/pst/SKILL.md +++ b/skills/pst/SKILL.md @@ -135,7 +135,7 @@ rules a hook reminds about (non-blocking). Detail and examples are in 23. **Maintainability review on every code change** `[NUDGE]`. After any change that touches at least one code file (non-docs, non-lockfile), run a Fowler-smell pass using `MAINTAINABILITY.md` as the rubric. This is a separate refactoring commit (two hats per rule 15): behavior stays identical, only structure improves. The pass is Haiku-tier since it is lightweight. The only exemption is a changeset that touches zero code files (docs-only, lockfile-only, or pure config values). A small diff does not exempt a change -- smells manifest at any size. Run with `/pst:adversarial-review` or inline. See `MAINTAINABILITY.md` for the 16 canonical smells. -24. **Best-of-N implementation tournament** `[NUDGE]`. For substantive refactors or extensions where multiple valid implementations exist, spawn 2-5 parallel Sonnet agents in isolated worktrees, each with a divergent strategy: Conservative (smallest diff, no new abstractions), Structural (reorganize by responsibility), Extract-first (name every abstraction). For N=5, add Domain-model and Functional strategies. After all agents complete, an Opus judge selects the winner using MAINTAINABILITY.md criteria: cohesion, coupling, intent clarity, and locality of change. Skip for trivial one-way fixes where a single correct path is obvious. Rule 15 two-hats still applies to every implementation agent. `/pst:refactor` implements this pattern. +24. **Best-of-N implementation tournament** `[NUDGE]`. For substantive refactors or extensions where multiple valid implementations exist, spawn 2-5 parallel Sonnet agents in isolated worktrees with divergent strategies, then have an Opus judge pick the winner using MAINTAINABILITY.md criteria. Skip for trivial one-way fixes. Rule 15 two-hats applies. `/pst:refactor` is the full implementation protocol. ## Usage diff --git a/skills/pst:refactor/SKILL.md b/skills/pst:refactor/SKILL.md index 69e63e9..c477593 100644 --- a/skills/pst:refactor/SKILL.md +++ b/skills/pst:refactor/SKILL.md @@ -97,70 +97,112 @@ Present smell count and N to the user via `AskUserQuestion`: "Run tournament with N=\ implementations?" (Yes / Report only / Adjust N). If `--report-only` was passed, stop here without asking. -## 5. Parallel implementation tournament (background) +## 5. Parallel implementation tournament -Spawn N background Sonnet agents (`model: sonnet`, `isolation: worktree`), -each receiving the same smell findings and target files but a different +Spawn N **foreground** Sonnet agents (`model: sonnet`, `isolation: worktree`) in +the **same response turn** so they run concurrently. Do NOT set +`run_in_background: true` -- all N must complete before Step 6 begins, and +synchronization is implicit when they are foreground. + +Each agent receives the same smell findings and target files but a different strategy directive: -- **Strategy A -- Conservative**: Fix only the highest-impact smells with the - smallest possible diff. Prefer inlining over new abstractions when the call - site is nearby. Preserve the existing module and file structure entirely. -- **Strategy B -- Structural**: Reorganize by responsibility. Group related - behavior together even if it means creating new files or moving methods - between classes. Optimize for locality of future change. -- **Strategy C -- Extract-first**: Create a named function, class, or module - for every smell instance. Err toward more abstractions with clear names. - Every duplicated concept gets a home. +- **A -- Conservative**: Fix only the highest-impact smells with the smallest + possible diff. Prefer inlining over new abstractions. Preserve existing module + and file structure entirely. +- **B -- Structural**: Reorganize by responsibility. Group related behavior + together even if it means creating new files or moving methods. Optimize for + locality of future change. +- **C -- Extract-first**: Create a named function, class, or module for every + smell instance. Err toward more abstractions with clear names. If N=5, add: -- **Strategy D -- Domain-model**: Look for primitive obsession and data clumps; - introduce domain objects or value types to make implicit business concepts - explicit. -- **Strategy E -- Functional**: Prefer pure functions and immutable data where - the language supports it. Move state to the edges. Reduce side-effect surface. +- **D -- Domain-model**: Look for primitive obsession and data clumps; introduce + domain objects or value types to make implicit business concepts explicit. +- **E -- Functional**: Prefer pure functions and immutable data where the + language supports it. Move state to the edges. Reduce side-effect surface. + +Each agent must end its response with exactly this block so the orchestrator +can collect results without accessing the worktree afterward: + +``` +---tournament-result--- +STRATEGY: +STATUS: committed +COMMIT_SHA: +DIFF: + +---end-tournament-result--- +``` + +If the agent cannot commit (tests fail, conflict, or other error), emit: -Each agent must: +``` +---tournament-result--- +STRATEGY: +STATUS: skipped: +---end-tournament-result--- +``` + +Steps each agent must follow: 1. Apply its strategy to fix the smells in the target files. 2. Run existing tests if present; skip any fix that breaks a test and note it. -3. Stage and commit: `git add && git commit -m "refactor(): "`. +3. Stage and commit: `git add && git commit -m "refactor(): "`. 4. Include co-author trailer: `Co-Authored-By: Claude Sonnet 4.6 `. +5. Run `git rev-parse HEAD` and `git diff HEAD~1..HEAD`; include both verbatim in the result block. -## 6. Opus judge selects winner (background) +## 6. Opus judge selects winner -After all N agents complete, spawn one background Opus agent (`model: opus`) -with: +After all N agents return (foreground means they are all done before this +step begins), parse each `---tournament-result---` block. Collect the SHA and +diff for every agent with `STATUS: committed`. If zero agents committed, report +all skip reasons and stop. -- All N diffs (collected via `git show HEAD` in each worktree). -- The smell findings from step 2. +Spawn one **foreground** Opus agent (`model: opus`) with: + +- All committed diffs (from the result blocks -- no worktree access needed). +- The smell findings from Step 2. - The content of `MAINTAINABILITY.md`. -- Instruction: evaluate each implementation against the six MAINTAINABILITY.md - outcomes (Higher Cohesion, Lower Coupling, Explicit Intent, Locality of - Change, Reduced Cognitive Load, Strong Domain Modeling). Score each 1-5 on - each dimension. Pick the winner. +- Instruction: evaluate each diff against the six MAINTAINABILITY.md outcomes + (Higher Cohesion, Lower Coupling, Explicit Intent, Locality of Change, + Reduced Cognitive Load, Strong Domain Modeling). Score each 1-5 per + dimension. Return the winning strategy letter. Return schema: ```json { - "winner": "Conservative|Structural|Extract-first|Domain-model|Functional", - "scores": { "A": 0, "B": 0, "C": 0 }, + "winner": "A|B|C|D|E", + "scores": { + "A": { + "cohesion": 0, + "coupling": 0, + "intent": 0, + "locality": 0, + "cognitive_load": 0, + "domain_model": 0 + } + }, "reasoning": "one sentence" } ``` ## 7. Apply and report -Cherry-pick the winning commit to the current branch: +Look up the SHA for the winning strategy from the result blocks collected in +Step 5 (not from the Opus judge). Cherry-pick it onto the current branch: ```sh -git cherry-pick +git cherry-pick ``` +If cherry-pick fails due to conflict, fall back to `git apply` with the +winning diff from the result block. + Report: winning strategy, Opus reasoning, scores for each strategy, smells -fixed vs. skipped, and test results. +fixed vs. skipped, test results, and any agents that skipped with their reason. ## Notes From 021f72116a26bc06762482fc41f4c4c8e5333af5 Mon Sep 17 00:00:00 2001 From: Patrick Taylor <1963845+pstaylor-patrick@users.noreply.github.com> Date: Sun, 21 Jun 2026 06:24:19 -0500 Subject: [PATCH 3/3] fix(pst:refactor): address 6 more adversarial findings in tournament protocol - Step 2 Opus was marked 'background', causing Step 3 to run before findings arrived; changed to foreground so smell analysis is complete before gate - N=1 trivial path was undefined; now routes directly to strategy A, no judge - Co-author trailer commit format specified as HEREDOC to preserve blank line - git apply fallback specifies write-to-temp-file via Write tool first - Opus judge schema shows all N strategies, not just A; added instruction to include every strategy that ran - Absent result block (agent crash/overflow) now treated as skipped Co-Authored-By: Claude Sonnet 4.6 Claude-Session: https://claude.ai/code/session_019BCr5Qxi8jXnG2Rr7zdkw1 --- skills/pst:refactor/SKILL.md | 64 +++++++++++++++++++++++++++++------- 1 file changed, 52 insertions(+), 12 deletions(-) diff --git a/skills/pst:refactor/SKILL.md b/skills/pst:refactor/SKILL.md index c477593..161ee01 100644 --- a/skills/pst:refactor/SKILL.md +++ b/skills/pst:refactor/SKILL.md @@ -34,9 +34,9 @@ git ls-files | grep -Ev '\.(lock|snap|min\.(js|css)|pb\.go|pb_test\.go)$' \ | grep -E '\.(rb|py|ts|tsx|js|jsx|go|rs|java|kt|swift|ex|exs|cs|cpp|c|h)$' ``` -## 2. Opus smell analysis (background) +## 2. Opus smell analysis -Spawn one background Opus agent (`model: opus`) with: +Spawn one **foreground** Opus agent (`model: opus`) with: - The full content of `MAINTAINABILITY.md` (read via the path resolved above). - The list of code files from step 1. @@ -89,7 +89,10 @@ only). Treat no objection as Yes. Determine N from the smell count and file count: -- **Trivial** (1-3 smells, 1-2 files): N=1 -- single Sonnet, skip tournament. +- **Trivial** (1-3 smells, 1-2 files): N=1 -- single Sonnet agent, skip + tournament. Spawn one foreground Sonnet agent using strategy A (Conservative). + No judge step; cherry-pick its commit directly in Step 7. If it cannot commit, + report the reason and stop. - **Moderate** (4-10 smells or 3-5 files): N=3. - **Complex** (11+ smells or 6+ files): N=5. @@ -149,16 +152,28 @@ Steps each agent must follow: 1. Apply its strategy to fix the smells in the target files. 2. Run existing tests if present; skip any fix that breaks a test and note it. -3. Stage and commit: `git add && git commit -m "refactor(): "`. -4. Include co-author trailer: `Co-Authored-By: Claude Sonnet 4.6 `. -5. Run `git rev-parse HEAD` and `git diff HEAD~1..HEAD`; include both verbatim in the result block. +3. Stage and commit using a HEREDOC so the trailer blank line is preserved: + + ```sh + git add + git commit -m "$(cat <<'EOF' + refactor(): + + Co-Authored-By: Claude Sonnet 4.6 + EOF + )" + ``` + +4. Run `git rev-parse HEAD` and `git diff HEAD~1..HEAD`; include both verbatim + in the result block. ## 6. Opus judge selects winner After all N agents return (foreground means they are all done before this -step begins), parse each `---tournament-result---` block. Collect the SHA and -diff for every agent with `STATUS: committed`. If zero agents committed, report -all skip reasons and stop. +step begins), parse each `---tournament-result---` block. If an agent's output +contains no result block at all, treat it as `STATUS: skipped: result block +missing`. Collect the SHA and diff for every agent with `STATUS: committed`. If +zero agents committed, report all skip reasons and stop. Spawn one **foreground** Opus agent (`model: opus`) with: @@ -170,7 +185,8 @@ Spawn one **foreground** Opus agent (`model: opus`) with: Reduced Cognitive Load, Strong Domain Modeling). Score each 1-5 per dimension. Return the winning strategy letter. -Return schema: +Return schema (include a scores entry for every strategy letter that appears +in the committed diffs you received -- do not omit strategies that ran): ```json { @@ -183,6 +199,22 @@ Return schema: "locality": 0, "cognitive_load": 0, "domain_model": 0 + }, + "B": { + "cohesion": 0, + "coupling": 0, + "intent": 0, + "locality": 0, + "cognitive_load": 0, + "domain_model": 0 + }, + "C": { + "cohesion": 0, + "coupling": 0, + "intent": 0, + "locality": 0, + "cognitive_load": 0, + "domain_model": 0 } }, "reasoning": "one sentence" @@ -198,8 +230,16 @@ Step 5 (not from the Opus judge). Cherry-pick it onto the current branch: git cherry-pick ``` -If cherry-pick fails due to conflict, fall back to `git apply` with the -winning diff from the result block. +If cherry-pick fails due to conflict, write the winning diff to a temp file +and apply it: + +```sh +# Write the diff from the result block to a temp file first +git apply /tmp/pst-refactor-winner.patch +``` + +Use the Write tool to write the diff string to `/tmp/pst-refactor-winner.patch` +before running that command. Report: winning strategy, Opus reasoning, scores for each strategy, smells fixed vs. skipped, test results, and any agents that skipped with their reason.