Skip to content

SamSpectre/autoresearch-agents

Repository files navigation

Autoresearch for Agentic Systems

Applying Karpathy's autoresearch pattern to agentic AI systems. Instead of autonomously optimizing LLM training code, we autonomously optimize agent prompts and configurations to improve measurable output quality.

The core insight: the autoresearch pattern is not about model training — it's a general framework for autonomous self-improvement of any system with a measurable quality metric.

Projects

A self-improving financial research pipeline that analyzes SEC 10-K filings. Three agents (extractor, analyst, synthesizer) process filings end-to-end, while an optimizer agent autonomously iterates on their skill prompts to maximize a composite quality score evaluated against XBRL ground truth.

  • Best composite score: 0.7340 (from 0.7235 baseline, +1.5%)
  • Optimizable surface: 3 skill file prompts (1 dimension)
  • Evaluation set: 13 companies across 6 sectors
  • Metric: Weighted blend of extraction accuracy, analysis quality, and cost efficiency

A self-improving RAG system tested against Meta's CRAG benchmark across 5 domains (finance, sports, music, movie, open). The optimizer tunes not just prompts but the entire RAG stack: chunking strategy, embedding model, retrieval parameters, model routing, and pipeline topology.

  • Best CRAG score: 0.360 (from 0.208 baseline, +73%)
  • Optimizable surface: 7 dimensions (config.yaml + 4 skill files)
  • Evaluation set: 500 questions, 5 domains, 8 question types
  • Metric: CRAG Score_a (accuracy - hallucination_rate)
  • Key finding: The optimizer learned to prioritize hallucination reduction over accuracy, found optimal confidence thresholds through experimentation, and discovered that more expensive models only help at specific pipeline stages.

Architecture Pattern

Karpathy's autoresearch          This repo
========================         ========================
prepare.py (fixed eval)    →     evaluate.py (fixed eval)
train.py (agent edits)     →     agents/skills/*.md (optimizer edits)
val_bpb (metric)           →     composite_score (metric)
program.md (strategy)      →     optimizer_program.md (strategy)
5-min GPU budget            →     fixed API cost budget
git keep/discard            →     git keep/discard

License

MIT

About

Applying Karpathy's autoresearch pattern to agentic systems — autonomous self-improvement via measurable quality metrics.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors