| diataxis_type | reference |
|---|---|
| title | Algorithm Reference |
| description | User-facing overview of the modify-evaluate-decide improvement loop algorithm |
This document provides a user-facing overview of the autoresearch improvement loop algorithm. For the complete formal specification, see the internal algorithm spec.
The autoresearch loop applies a modify-evaluate-decide cycle to iteratively improve Claude Code skills. The core guarantee: your original skill is never modified until you explicitly approve.
INPUT: skill_path, max_iterations (default 5)
OUTPUT: workspace with v0/ snapshot, candidate/ working copy, baseline score
1. Create workspace directory: {skill-name}-autoresearch/
2. Snapshot skill → workspace/v0/ (immutable baseline)
3. Snapshot skill → workspace/candidate/ (mutable working copy)
4. Evaluate candidate → score_0
5. Log: iteration=0, score=score_0, action=baseline
6. Set best = {version: 0, score: score_0}
INPUT: candidate skill directory
OUTPUT: composite score (mean pass_rate across all evals)
For each eval case in evals/evals.json:
1. Execute: run the eval prompt with the candidate skill loaded
2. Grade: spawn skill-creator's grader agent
3. Collect: grading.json with per-expectation pass/fail
4. Extract: summary.pass_rate
Score = mean(all pass_rates)
FOR i = 1 to max_iterations:
1. IMPROVE: spawn improver agent with failure data → modifies candidate/
2. EVALUATE: run all evals on modified candidate → score_i
3. DECIDE:
- If score_i > best.score → KEEP (snapshot candidate → v{i}/, update best)
- If score_i ≤ best.score → REVERT (restore candidate from best snapshot)
4. LOG: append to results.tsv
5. CHECK ABORT:
- best.score ≥ 1.0 → stop (perfect)
- last 3 actions all "reverted" → stop (stuck)
1. Generate convergence report (score trajectory, diff, weaknesses)
2. Show report and unified diff to user
3. Ask: apply best version to original skill? [y/n]
4. If yes: restore(best_snapshot → skill_path)
score = mean([grading.summary.pass_rate for grading in iteration_grading_files])
Where pass_rate = passed_expectations / total_expectations for each eval case.
- Original skill immutable during the loop — only
candidate/is modified - Snapshots immutable — once
v{N}/is created, never modified - Evals frozen during improvement — the improver cannot touch
evals/ - SHA-compare before write —
snapshot()skips unchanged files - User confirmation required before applying changes to the original
- Regression abort — 3 consecutive reverts stops the loop
- Internal algorithm spec — Full formal specification
- Convergence and Scoring — Detailed scoring explanation
- Lifecycle — End-to-end lifecycle description