Skip to content

aiming-lab/AutoResearchClaw

Repository files navigation

AutoResearchClaw Logo

Chat an Idea. Get a Paper. Fully Autonomous & Self-Evolving.

Just chat with OpenClaw: "Research X" β†’ done.

AutoResearchClaw Framework

MIT License Python 3.11+ 1633 Tests Passed GitHub OpenClaw Compatible Discord

πŸ‡¨πŸ‡³ δΈ­ζ–‡ Β· πŸ‡―πŸ‡΅ ζ—₯本θͺž Β· πŸ‡°πŸ‡· ν•œκ΅­μ–΄ Β· πŸ‡«πŸ‡· FranΓ§ais Β· πŸ‡©πŸ‡ͺ Deutsch Β· πŸ‡ͺπŸ‡Έ EspaΓ±ol Β· πŸ‡§πŸ‡· PortuguΓͺs Β· πŸ‡·πŸ‡Ί Русский Β· πŸ‡ΈπŸ‡¦ Ψ§Ω„ΨΉΨ±Ψ¨ΩŠΨ©

πŸ† Paper Showcase Β· πŸ“– Integration Guide Β· πŸ’¬ Discord Community


Sample Paper πŸ† Generated Paper Showcase

8 papers across 8 domains β€” math, statistics, biology, computing, NLP, RL, vision, robustness β€” generated fully autonomously with zero human intervention.

View Showcase

πŸ§ͺ We're looking for testers! Try the pipeline with your own research idea β€” from any field β€” and tell us what you think. Your feedback directly shapes the next version. β†’ Testing Guide | β†’ δΈ­ζ–‡ζ΅‹θ―•ζŒ‡ε— | β†’ ζ—₯本θͺžγƒ†γ‚Ήγƒˆγ‚¬γ‚€γƒ‰


πŸ”₯ News

  • [03/18/2026] v0.3.1 β€” OpenCode Beast Mode + Community Contributions β€” New "Beast Mode" routes complex code generation to OpenCode with automatic complexity scoring and graceful fallback. Added Novita AI provider support, thread-safety hardening, improved LLM output parsing robustness, and 20+ bug fixes from community PRs and internal audit.
  • [03/17/2026] v0.3.0 β€” MetaClaw Integration β€” AutoResearchClaw now supports MetaClaw cross-run learning: pipeline failures β†’ structured lessons β†’ reusable skills, injected into all 23 stages. +18.3% robustness in controlled experiments. Opt-in (metaclaw_bridge.enabled: true), fully backward-compatible. See Integration Guide.
  • [03/16/2026] v0.2.0 β€” Three multi-agent subsystems (CodeAgent, BenchmarkAgent, FigureAgent), hardened Docker sandbox with network-policy-aware execution, 4-round paper quality audit (AI-slop detection, 7-dim review scoring, NeurIPS checklist), and 15+ bug fixes from production runs.
  • [03/15/2026] v0.1.0 β€” We release AutoResearchClaw: a fully autonomous 23-stage research pipeline that turns a single research idea into a conference-ready paper. No human intervention required.

⚑ One Command. One Paper.

pip install -e . && researchclaw setup && researchclaw init && researchclaw run --topic "Your research idea here" --auto-approve

πŸ€” What Is This?

You think it. AutoResearchClaw writes it.

Drop a research topic β€” get back a full academic paper with real literature from OpenAlex, Semantic Scholar & arXiv, hardware-aware sandbox experiments (GPU/MPS/CPU auto-detected), statistical analysis, multi-agent peer review, and conference-ready LaTeX targeting NeurIPS/ICML/ICLR. No babysitting. No copy-pasting. No hallucinated references.

πŸ“„paper_draft.mdFull academic paper (Introduction, Related Work, Method, Experiments, Results, Conclusion)
πŸ“paper.texConference-ready LaTeX (NeurIPS / ICLR / ICML templates)
πŸ“šreferences.bibReal BibTeX references from OpenAlex, Semantic Scholar and arXiv β€” auto-pruned to match inline citations
πŸ”verification_report.json4-layer citation integrity + relevance verification (arXiv, CrossRef, DataCite, LLM)
πŸ§ͺexperiment runs/Generated code + sandbox results + structured JSON metrics
πŸ“Šcharts/Auto-generated condition comparison charts with error bars and confidence intervals
πŸ“reviews.mdMulti-agent peer review with methodology-evidence consistency checks
🧬evolution/Self-learning lessons extracted from each run
πŸ“¦deliverables/All final outputs in one folder β€” compile-ready for Overleaf

The pipeline runs end-to-end without human intervention. When experiments fail, it self-heals. When hypotheses don't hold, it pivots. When citations are fake, it kills them.


πŸš€ Quick Start

# 1. Clone & install
git clone https://github.com/aiming-lab/AutoResearchClaw.git
cd AutoResearchClaw
python3 -m venv .venv && source .venv/bin/activate
pip install -e .

# 2. Setup (interactive β€” installs OpenCode beast mode, checks Docker/LaTeX)
researchclaw setup

# 3. Configure
researchclaw init          # Interactive: choose LLM provider, creates config.arc.yaml
# Or manually: cp config.researchclaw.example.yaml config.arc.yaml

# 4. Run
export OPENAI_API_KEY="sk-..."
researchclaw run --config config.arc.yaml --topic "Your research idea" --auto-approve

Output β†’ artifacts/rc-YYYYMMDD-HHMMSS-<hash>/deliverables/ β€” compile-ready LaTeX, BibTeX, experiment code, charts.

πŸ“ Minimum required config
project:
  name: "my-research"

research:
  topic: "Your research topic here"

llm:
  base_url: "https://api.openai.com/v1"
  api_key_env: "OPENAI_API_KEY"
  primary_model: "gpt-4o"
  fallback_models: ["gpt-4o-mini"]

experiment:
  mode: "sandbox"
  sandbox:
    python_path: ".venv/bin/python"

🧠 What Makes It Different

Capability How It Works
πŸ”„ PIVOT / REFINE Loop Stage 15 autonomously decides: PROCEED, REFINE (tweak params), or PIVOT (new direction). Artifacts auto-versioned.
πŸ€– Multi-Agent Debate Hypothesis generation, result analysis, and peer review each use structured multi-perspective debate.
🧬 Self-Learning Lessons extracted per run (decision rationale, runtime warnings, metric anomalies) with 30-day time-decay. Future runs learn from past mistakes.
πŸ“š Knowledge Base Every run builds structured KB across 6 categories (decisions, experiments, findings, literature, questions, reviews).
πŸ›‘οΈ Sentinel Watchdog Background quality monitor: NaN/Inf detection, paper-evidence consistency, citation relevance scoring, anti-fabrication guard.

🦞 OpenClaw Integration

AutoResearchClaw is an OpenClaw-compatible service. Install it in OpenClaw and launch autonomous research with a single message β€” or use it standalone via CLI, Claude Code, or any AI coding assistant.

πŸš€ Use with OpenClaw (Recommended)

If you already use OpenClaw as your AI assistant:

1️⃣  Share the GitHub repo URL with OpenClaw
2️⃣  OpenClaw auto-reads RESEARCHCLAW_AGENTS.md β†’ understands the pipeline
3️⃣  Say: "Research [your topic]"
4️⃣  Done β€” OpenClaw clones, installs, configures, runs, and returns results

That's it. OpenClaw handles git clone, pip install, config setup, and pipeline execution automatically. You just chat.

πŸ’‘ What happens under the hood
  1. OpenClaw reads RESEARCHCLAW_AGENTS.md β†’ learns the research orchestrator role
  2. OpenClaw reads README.md β†’ understands installation and pipeline structure
  3. OpenClaw copies config.researchclaw.example.yaml β†’ config.yaml
  4. Asks for your LLM API key (or uses your environment variable)
  5. Runs pip install -e . + researchclaw run --topic "..." --auto-approve
  6. Returns the paper, LaTeX, experiments, and citations

πŸ”Œ OpenClaw Bridge (Advanced)

For deeper integration, AutoResearchClaw includes a bridge adapter system with 6 optional capabilities:

# config.arc.yaml
openclaw_bridge:
  use_cron: true              # ⏰ Scheduled research runs
  use_message: true           # πŸ’¬ Progress notifications (Discord/Slack/Telegram)
  use_memory: true            # 🧠 Cross-session knowledge persistence
  use_sessions_spawn: true    # πŸ”€ Spawn parallel sub-sessions for concurrent stages
  use_web_fetch: true         # 🌐 Live web search during literature review
  use_browser: false          # πŸ–₯️ Browser-based paper collection

Each flag activates a typed adapter protocol. When OpenClaw provides these capabilities, the adapters consume them without code changes. See docs/integration-guide.md for full details.

ACP (Agent Client Protocol)

AutoResearchClaw can use any ACP-compatible coding agent as its LLM backend β€” no API keys required. The agent communicates via acpx, maintaining a single persistent session across all 23 pipeline stages.

Agent Command Notes
Claude Code claude Anthropic
Codex CLI codex OpenAI
Copilot CLI gh GitHub
Gemini CLI gemini Google
OpenCode opencode SST
Kimi CLI kimi Moonshot
# config.yaml β€” ACP example
llm:
  provider: "acp"
  acp:
    agent: "claude"   # Any ACP-compatible agent CLI command
    cwd: "."          # Working directory for the agent
  # No base_url or api_key needed β€” the agent handles its own auth.
# Just run β€” the agent uses its own credentials
researchclaw run --config config.yaml --topic "Your research idea" --auto-approve

πŸ› οΈ Other Ways to Run

Method How
Standalone CLI researchclaw setup β†’ researchclaw init β†’ researchclaw run --topic "..." --auto-approve
Python API from researchclaw.pipeline import Runner; Runner(config).run()
Claude Code Reads RESEARCHCLAW_CLAUDE.md β€” just say "Run research on [topic]"
Copilot CLI researchclaw run --topic "..." with llm.acp.agent: "gh"
OpenCode Reads .claude/skills/ β€” same natural language interface
Any AI CLI Provide RESEARCHCLAW_AGENTS.md as context β†’ agent auto-bootstraps

πŸ”¬ Pipeline: 23 Stages, 8 Phases

Phase A: Research Scoping          Phase E: Experiment Execution
  1. TOPIC_INIT                      12. EXPERIMENT_RUN
  2. PROBLEM_DECOMPOSE               13. ITERATIVE_REFINE  ← self-healing

Phase B: Literature Discovery      Phase F: Analysis & Decision
  3. SEARCH_STRATEGY                 14. RESULT_ANALYSIS    ← multi-agent
  4. LITERATURE_COLLECT  ← real API  15. RESEARCH_DECISION  ← PIVOT/REFINE
  5. LITERATURE_SCREEN   [gate]
  6. KNOWLEDGE_EXTRACT               Phase G: Paper Writing
                                     16. PAPER_OUTLINE
Phase C: Knowledge Synthesis         17. PAPER_DRAFT
  7. SYNTHESIS                       18. PEER_REVIEW        ← evidence check
  8. HYPOTHESIS_GEN    ← debate      19. PAPER_REVISION

Phase D: Experiment Design         Phase H: Finalization
  9. EXPERIMENT_DESIGN   [gate]      20. QUALITY_GATE      [gate]
 10. CODE_GENERATION                 21. KNOWLEDGE_ARCHIVE
 11. RESOURCE_PLANNING               22. EXPORT_PUBLISH     ← LaTeX
                                     23. CITATION_VERIFY    ← relevance check

Gate stages (5, 9, 20) pause for human approval or auto-approve with --auto-approve. On rejection, the pipeline rolls back.

Decision loops: Stage 15 can trigger REFINE (β†’ Stage 13) or PIVOT (β†’ Stage 8), with automatic artifact versioning.

πŸ“‹ What Each Phase Does
Phase What Happens
A: Scoping LLM decomposes the topic into a structured problem tree with research questions
A+: Hardware Auto-detects GPU (NVIDIA CUDA / Apple MPS / CPU-only), warns if local hardware is limited, adapts code generation accordingly
B: Literature Multi-source search (OpenAlex β†’ Semantic Scholar β†’ arXiv) for real papers, screens by relevance, extracts knowledge cards
C: Synthesis Clusters findings, identifies research gaps, generates testable hypotheses via multi-agent debate
D: Design Designs experiment plan, generates hardware-aware runnable Python (GPU tier β†’ package selection), estimates resource needs
E: Execution Runs experiments in sandbox, detects NaN/Inf and runtime bugs, self-heals code via targeted LLM repair
F: Analysis Multi-agent analysis of results; autonomous PROCEED / REFINE / PIVOT decision with rationale
G: Writing Outlines β†’ section-by-section drafting (5,000-6,500 words) β†’ peer reviews (with methodology-evidence consistency) β†’ revises with length guard
H: Finalization Quality gate, knowledge archival, LaTeX export with conference template, citation integrity + relevance verification

✨ Key Features

Feature Description
πŸ“š Multi-Source Literature Real papers from OpenAlex, Semantic Scholar & arXiv β€” query expansion, deduplication, circuit breaker with graceful degradation
πŸ” 4-Layer Citation Verification arXiv ID check β†’ CrossRef/DataCite DOI β†’ Semantic Scholar title match β†’ LLM relevance scoring. Hallucinated refs auto-removed.
πŸ–₯️ Hardware-Aware Execution Auto-detects GPU (NVIDIA CUDA / Apple MPS / CPU-only) and adapts code generation, imports, and experiment scale accordingly
🦾 OpenCode Beast Mode Complex experiments auto-routed to OpenCode β€” generates multi-file projects with custom architectures, training loops, and ablation studies. Install via researchclaw setup.
πŸ§ͺ Sandbox Experiments AST-validated code, immutable harness, NaN/Inf fast-fail, self-healing repair, iterative refinement (up to 10 rounds), partial result capture
πŸ“ Conference-Grade Writing NeurIPS/ICML/ICLR templates, section-by-section drafting (5,000-6,500 words), anti-fabrication guard, revision length guard, anti-disclaimer enforcement
πŸ“ Template Switching neurips_2025, iclr_2026, icml_2026 β€” Markdown β†’ LaTeX with math, tables, figures, cross-refs, \cite{}
🚦 Quality Gates 3 human-in-the-loop gates (Stages 5, 9, 20) with rollback. Skip with --auto-approve.

🧠 MetaClaw Integration

AutoResearchClaw + MetaClaw = A pipeline that learns from every run.

MetaClaw adds cross-run knowledge transfer to AutoResearchClaw. When enabled, the pipeline automatically captures lessons from failures and warnings, converts them into reusable skills, and injects those skills into all 23 pipeline stages on subsequent runs β€” so the same mistakes are never repeated.

How It Works

Run N executes β†’ failures/warnings captured as Lessons
                      ↓
          MetaClaw Lesson β†’ Skill conversion
                      ↓
          arc-* Skill files stored in ~/.metaclaw/skills/
                      ↓
Run N+1 β†’ build_overlay() injects skills into every LLM prompt
                      ↓
          LLM avoids known pitfalls β†’ higher quality, fewer retries

Quick Setup

# 1. Install MetaClaw (if not already)
pip install metaclaw

# 2. Enable in your config
# config.arc.yaml
metaclaw_bridge:
  enabled: true
  proxy_url: "http://localhost:30000"        # MetaClaw proxy (optional)
  skills_dir: "~/.metaclaw/skills"          # Where skills are stored
  fallback_url: "https://api.openai.com/v1" # Direct LLM fallback
  fallback_api_key: ""                      # API key for fallback URL
  lesson_to_skill:
    enabled: true
    min_severity: "warning"                 # Convert warnings + errors
    max_skills_per_run: 3
# 3. Run as usual β€” MetaClaw works transparently
researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve

After each run, check ~/.metaclaw/skills/arc-*/SKILL.md to see the skills your pipeline has learned.

Experiment Results

In controlled A/B experiments (same topic, same LLM, same configuration):

Metric Baseline With MetaClaw Improvement
Stage retry rate 10.5% 7.9% -24.8%
Refine cycle count 2.0 1.2 -40.0%
Pipeline stage completion 18/19 19/19 +5.3%
Overall robustness score (composite) 0.714 0.845 +18.3%

Composite robustness score is a weighted average of stage completion rate (40%), retry reduction (30%), and refine cycle efficiency (30%).

Backward Compatibility

  • Default: OFF. If metaclaw_bridge is absent or enabled: false, the pipeline behaves exactly as before.
  • No new dependencies. MetaClaw is optional β€” the core pipeline works without it.
  • All 1,634 existing tests pass with the integration code present.

βš™οΈ Configuration Reference

Click to expand full configuration reference
# === Project ===
project:
  name: "my-research"              # Project identifier
  mode: "docs-first"               # docs-first | semi-auto | full-auto

# === Research ===
research:
  topic: "..."                     # Research topic (required)
  domains: ["ml", "nlp"]           # Research domains for literature search
  daily_paper_count: 8             # Target papers per search query
  quality_threshold: 4.0           # Minimum quality score for papers

# === Runtime ===
runtime:
  timezone: "America/New_York"     # For timestamps
  max_parallel_tasks: 3            # Concurrent experiment limit
  approval_timeout_hours: 12       # Gate stage timeout
  retry_limit: 2                   # Retry count on stage failure

# === LLM ===
llm:
  provider: "openai-compatible"    # openai | openrouter | deepseek | minimax | acp | openai-compatible
  base_url: "https://..."          # API endpoint (required for openai-compatible)
  api_key_env: "OPENAI_API_KEY"    # Env var for API key (required for openai-compatible)
  api_key: ""                      # Or hardcode key here
  primary_model: "gpt-4o"          # Primary model
  fallback_models: ["gpt-4o-mini"] # Fallback chain
  s2_api_key: ""                   # Semantic Scholar API key (optional, higher rate limits)
  acp:                             # Only used when provider: "acp"
    agent: "claude"                # ACP agent CLI command (claude, codex, gemini, etc.)
    cwd: "."                       # Working directory for the agent

# === Experiment ===
experiment:
  mode: "sandbox"                  # simulated | sandbox | docker | ssh_remote
  time_budget_sec: 300             # Max execution time per run (default: 300s)
  max_iterations: 10               # Max optimization iterations
  metric_key: "val_loss"           # Primary metric name
  metric_direction: "minimize"     # minimize | maximize
  sandbox:
    python_path: ".venv/bin/python"
    gpu_required: false
    allowed_imports: [math, random, json, csv, numpy, torch, sklearn]
    max_memory_mb: 4096
  docker:
    image: "researchclaw/experiment:latest"
    network_policy: "setup_only"   # none | setup_only | pip_only | full
    gpu_enabled: true
    memory_limit_mb: 8192
    auto_install_deps: true        # Auto-detect imports β†’ requirements.txt
  ssh_remote:
    host: ""                       # GPU server hostname
    gpu_ids: []                    # Available GPU IDs
    remote_workdir: "/tmp/researchclaw_experiments"
  opencode:                          # OpenCode Beast Mode (auto-installed via `researchclaw setup`)
    enabled: true                    # Master switch (default: true)
    auto: true                       # Auto-trigger without confirmation (default: true)
    complexity_threshold: 0.2        # 0.0-1.0 β€” higher = only trigger on complex experiments
    model: ""                        # Override model (empty = use llm.primary_model)
    timeout_sec: 600                 # Max seconds for OpenCode generation
    max_retries: 1                   # Retry count on failure
    workspace_cleanup: true          # Remove temp workspace after collection

# === Export ===
export:
  target_conference: "neurips_2025"  # neurips_2025 | iclr_2026 | icml_2026
  authors: "Anonymous"
  bib_file: "references"

# === Prompts ===
prompts:
  custom_file: ""                  # Path to custom prompts YAML (empty = defaults)

# === Security ===
security:
  hitl_required_stages: [5, 9, 20] # Stages requiring human approval
  allow_publish_without_approval: false
  redact_sensitive_logs: true

# === Knowledge Base ===
knowledge_base:
  backend: "markdown"              # markdown | obsidian
  root: "docs/kb"

# === Notifications ===
notifications:
  channel: "console"               # console | discord | slack
  target: ""

# === MetaClaw Bridge (Optional) ===
metaclaw_bridge:
  enabled: false                   # Set to true to enable cross-run learning
  proxy_url: "http://localhost:30000"  # MetaClaw proxy URL
  skills_dir: "~/.metaclaw/skills" # Where arc-* skills are stored
  fallback_url: ""                 # Direct LLM fallback when proxy is down
  fallback_api_key: ""             # API key for fallback endpoint
  lesson_to_skill:
    enabled: true                  # Auto-convert lessons to skills
    min_severity: "warning"        # Minimum severity to convert
    max_skills_per_run: 3          # Max new skills per pipeline run

# === OpenClaw Bridge ===
openclaw_bridge:
  use_cron: false                  # Scheduled research runs
  use_message: false               # Progress notifications
  use_memory: false                # Cross-session knowledge persistence
  use_sessions_spawn: false        # Spawn parallel sub-sessions
  use_web_fetch: false             # Live web search
  use_browser: false               # Browser-based paper collection

πŸ™ Acknowledgments

Inspired by:

  • πŸ”¬ AI Scientist (Sakana AI) β€” Automated research pioneer
  • 🧠 AutoResearch (Andrej Karpathy) β€” End-to-end research automation
  • 🌐 FARS (Analemma) β€” Fully Automated Research System

πŸ“„ License

MIT β€” see LICENSE for details.


πŸ“Œ Citation

If you find AutoResearchClaw useful, please cite:

@misc{liu2026autoresearchclaw,
  author       = {Liu, Jiaqi and Xia, Peng and Han, Siwei and Qiu, Shi and Zhang, Letian and Chen, Guiming  and Tu, Haoqin and Yang, Xinyu and and Zhou, Jiawei and Zhu, Hongtu and Li, Yun and Zhou, Yuyin and Zheng, Zeyu and Xie, Cihang and Ding, Mingyu and Yao, Huaxiu},
  title        = {AutoResearchClaw: Fully Autonomous Research from Idea to Paper},
  year         = {2026},
  organization = {GitHub},
  url          = {https://github.com/aiming-lab/AutoResearchClaw},
}

Built with 🦞 by the AutoResearchClaw team