Autoresearch

Self-discovery optimization engine for AI skills. Autonomously finds weaknesses, generates evaluation criteria, runs experiments, and validates improvements — no manual eval definition required.

Inspired by karpathy/autoresearch.

How It Works

Autoresearch implements a Science Loop — a 4-phase research cycle that iterates until the skill is optimized or budget is exhausted:

ANALYZE → HYPOTHESIZE → EXPERIMENT → VALIDATE → (loop or stop)
    ↑                                    │
    └──── RE-ANALYZE (every 3 exps) ─────┘

Analyze — LLM reads the skill file, discovers weaknesses, and auto-generates binary eval criteria
Hypothesize — ranks weaknesses by severity and failure frequency, generates a targeted fix hypothesis
Experiment — applies the mutation, runs the skill against test scenarios, scores with LLM-as-judge and rule-based evals
Validate — keeps the change only if score strictly improves with no per-eval regression (>20% drop triggers rollback)

Key features:

Self-discovery — finds what to optimize, not just how
Re-analysis feedback loop — discovers new weaknesses from experiment results
Eval calibration — automatically drops too-easy evals after baseline
Rule-based evals — deterministic checks (regex, contains, word count) alongside LLM-as-judge
Health Scan — batch-analyze a skill directory and auto-optimize the weakest

Quick Start

# Prerequisites: Bun runtime + Claude CLI
bun install

# Optimize a skill file
bun run src/cli.ts optimize path/to/skill.md

# Optimize with custom evals
bun run src/cli.ts optimize skill.md --evals evals.json --max-experiments 10

# Scan all your skills for weaknesses
bun run src/cli.ts scan --scope own

# Scan and auto-optimize the weakest skill
bun run src/cli.ts scan --scope all --auto

Output

Each optimization run generates:

dashboard.html — interactive score chart
diff.html — before/after diff with mutation rationale
CHANGELOG.md — experiment log
results.json — structured results for programmatic use

Tech Stack

TypeScript on Bun runtime
Claude CLI (claude --print) as the LLM backend
Zero external dependencies beyond Bun

Development

bun install            # Install dependencies
bun run build          # Build the project
bun test               # Run all tests (97 pass)
bun test <file>        # Run a single test file

Architecture

See CLAUDE.md for detailed architecture, file structure, and design decisions.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
docs		docs
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
bun.lock		bun.lock
package.json		package.json
skill.md		skill.md
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autoresearch

How It Works

Quick Start

Output

Tech Stack

Development

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autoresearch

How It Works

Quick Start

Output

Tech Stack

Development

Architecture

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages