Autonomous ML research agent skill. Interviews the user, scopes an ML project into a tractable plan, then designs experiments, deploys to Modal GPUs, tracks results, and iterates within a compute budget.
Also includes treadmill — a portable recurring-command skill for agent harnesses that don't have a built-in loop.
Built on the Agent Skills open standard. Practically, this skill only works well with Claude Code — it relies heavily on interactive tool use (subagent spawning, user interviews, recurring loops) that other harnesses like Codex, OpenCode, and Cursor don't yet support. Install instructions for other tools are included for forward-compatibility, but expect a degraded experience.
Copy the skill directories to your personal skills folder:
cp -r labrat/ ~/.claude/skills/labrat
cp -r treadmill/ ~/.claude/skills/treadmillOr install as a plugin from GitHub:
# In Claude Code
/install-plugin gh:tnguyen21/labratThen invoke with /labrat or just describe a research goal.
Copy the skill into your project:
cp -r labrat/ .codex/skills/labratOr place the AGENTS.md in your project root for basic instructions without the full skill system.
Copy the skill to any supported location:
# Project-local
cp -r labrat/ .opencode/skills/labrat
# Or global
cp -r labrat/ ~/.config/opencode/skills/labratOpenCode also reads from .claude/skills/ and .agents/skills/.
Copy the labrat/ directory to wherever your tool discovers skills. The format follows the Agent Skills spec.
- Python 3.12+
- Modal CLI (
uv tool install modal && modal setup) - Modal account with GPU access
Start a new research session:
Run a labrat session: test whether dropout rate affects convergence on CIFAR-10. Budget: $10.
For a new session, the agent first asks a few clarifying questions and writes a scoped project brief in .research/scope.md before writing code.
Continue an existing session (if .research/state.json exists):
Continue the research session.
Check status without an agent:
python labrat/scripts/research-statusAdvance state without an agent:
python labrat/scripts/research-advanceFor Codex-style unattended progress, run the supervisor instead:
python labrat/scripts/research-supervise- Interview — Clarifies goals, metrics, constraints, and writes a scoped project brief
- Initialize — Creates
.research/withstate.json,scope.md,plan.md,log.md - Baseline — Always runs a baseline experiment first
- Iterate — Each experiment changes one variable from baseline
- Track — Logs results, tracks spend against budget, and reconciles finished volume-backed artifacts back into state
- Conclude — Writes
summary.mdwith results table and findings
research-advance now prefers remote results.json artifacts on a Modal Volume and only falls back to app-state inspection for recovery. research-supervise adds the next handoff for Codex by invoking codex exec when the session is actionable again, which is the closest equivalent to Claude Code's recurring /loop.
Experiments run on Modal GPUs (defaults to T4, the cheapest option). Each experiment produces:
config.json— hyperparameterstrain.py— training scriptmodal_app.py— Modal deployment wrapperresults.json— collected after run
labrat/ # ML research skill
SKILL.md # Agent instructions
scripts/research-advance # State reconciliation worker
scripts/research-supervise # Reconcile state, then wake Codex if needed
scripts/research-status # CLI status checker
references/modal-patterns.md # Modal deployment patterns
treadmill/ # Recurring command skill
SKILL.md # Agent instructions
scripts/treadmill # Background loop manager
AGENTS.md # Codex/fallback instructions
README.md
LICENSE
Apache-2.0