AutoScientists is a decentralized team of AI agents for long-running computational scientific experimentation. Unlike prior agent systems that follow a single research trajectory or coordinate through a central planner, AutoScientists agents self-organize into teams around promising hypotheses, critique each other's proposals before spending experimental compute, and share successes and failures so the system avoids redundant exploration and sustains parallel search as evidence accumulates over hours or days.
This repository packages the system as Claude Code subagents coordinating through a local ClawInstitute server (workshops, workspaces, message-board posts). The orchestrator is a pure coordinator — it launches agents and harvests their results, never trains anything itself.
- BioML-Bench (24 biomedical ML tasks across biomedical imaging, protein engineering, single-cell omics, drug discovery): 74.4% mean leaderboard percentile, +8.33% over the strongest prior AI agent.
- nanoGPT training optimization: 1.9× faster to a target validation metric; 7 accepted improvements vs. 0 for a single-agent baseline.
- ProteinGym fitness prediction: +12.5% on the ACE2-Spike binding assay; +6.5% averaged across all 217 assays.
Three bundled task families (per-task data prep and details live in each task-<name>/README.md):
task-autoresearch/— open-ended nanoGPTval_bpboptimization, wrapping karpathy/autoresearch.task-biomlbench/— 24 biomedical ML benchmarks across drug discovery, protein engineering, single-cell omics, and biomedical imaging.task-protein-gym/— ProteinGym Spike (SARS-CoV-2) fitness prediction, evolving a Kermut GP baseline.
Prerequisites: Node.js 22+ (ships with npx), Python 3.9+, and the Claude Code CLI (claude).
# Start the local ClawInstitute server (agents will all coordinate through this)
npx clawinstitute start
# Install Python deps (requests, pyyaml)
pip install -r requirements.txtnpx clawinstitute start downloads the clawinstitute package from npm on first run and starts the server in the foreground; subsequent runs reuse the cache. Prefer a permanent install? npm install -g clawinstitute, then clawinstitute start.
From the repo root, in a separate shell:
claude -p "Read runbook.md and execute. Task: task-autoresearch. Run name: ar_v1."
claude -p "Read runbook.md and execute. Task: task-biomlbench/drug_discovery/tdcommons-lipophilicity-astrazeneca. Run name: lipo_v1."
claude -p "Read runbook.md and execute. Task: task-protein-gym. Run name: spike_v1."Each launch materializes a new sibling directory ../<run-name>/ with its own copy of the system, agents, workspace, and logs; the template itself stays clean across runs. Hardware requirements vary per task — see each task-<name>/README.md.
Drop a task-<name>/ directory at the repo root with two files:
TASK.md— task spec. YAML frontmatter should settask_type(one ofoptimization,biomlbench,proteingym) andname; see the three bundledtask-*/TASK.mdfiles for the conventional shape. The markdown body describes the problem, data, and constraints for the agents to read.LAUNCH.md— task profile filling in the 13 hooksrunbook.mdreferences (launch_command,discussion_policy,gpu_dispatch,champion_promotion,stagnation_response,exit_condition, etc.). Easiest path: copy the bundledtask-<name>/LAUNCH.mdclosest to your task and edit the hooks that need to differ.
Optionally add a setup script to fetch baseline code or data — see task-autoresearch/download_repo.sh or task-protein-gym/download_data.sh for examples.
Then launch with --task task-<name>. launch.py walks up from the --task path to find the nearest LAUNCH.md, so a family-level LAUNCH.md can cover many subtasks (as task-biomlbench/ does for its 24 subtasks) while any specific subtask can override by shipping its own LAUNCH.md.
@misc{gao2026autoscientistsselforganizingagentteams,
title={AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation},
author={Shanghua Gao and Ada Fang and Marinka Zitnik},
year={2026},
eprint={2605.28655},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2605.28655},
}