Summary
Add a ratchet mode to the taskrunner that measures outcomes before and after autonomous code changes, automatically reverting changes that don't measurably improve results.
Why
Current flow: self-improvement files issue → taskrunner creates PR → PR gets merged (or not). No verification that the change actually improved anything. We've shipped "improvements" that introduced regressions (synthesis noise, conversation-facts errors).
The ratchet pattern: improve → re-run → measure → keep only if metrics improve, else revert.
How it works
- Baseline capture — before task runs, snapshot relevant metrics (test pass rate, typecheck, error count from task_runs table)
- Task execution — normal taskrunner flow, creates branch + PR
- Validation run — after task completes, re-run tests/typecheck on the branch
- Comparison — compare post-task metrics against baseline
- Decision — if metrics improved or held steady: keep. If regressed: auto-close PR with explanation
Scope
- Opt-in per task via
ratchet: true in task config or category-level default
- Best fit for
refactor and bugfix categories
docs and tests categories skip ratchet (no regression risk)
- Metrics to compare: typecheck pass/fail, test suite results, error counts
References
- recursive-improve
/ratchet command (Apache 2.0) — overnight improvement loop with keep/revert
- Existing:
--loop mode, adversarial governance, PR utility scoring
Summary
Add a ratchet mode to the taskrunner that measures outcomes before and after autonomous code changes, automatically reverting changes that don't measurably improve results.
Why
Current flow: self-improvement files issue → taskrunner creates PR → PR gets merged (or not). No verification that the change actually improved anything. We've shipped "improvements" that introduced regressions (synthesis noise, conversation-facts errors).
The ratchet pattern: improve → re-run → measure → keep only if metrics improve, else revert.
How it works
Scope
ratchet: truein task config or category-level defaultrefactorandbugfixcategoriesdocsandtestscategories skip ratchet (no regression risk)References
/ratchetcommand (Apache 2.0) — overnight improvement loop with keep/revert--loopmode, adversarial governance, PR utility scoring