# Create a swarmfile and launch.
SWARM_CONFIG=swarm.json ./launch.sh start --dashboardAll configuration lives in the swarmfile (JSON). Place a
swarm.json in your repo root or point to it with SWARM_CONFIG.
./launch.sh start [--dashboard] # Launch agents.
./launch.sh stop # Stop all agents.
./launch.sh status # Show containers.
./launch.sh logs N # Tail agent N logs.
./launch.sh wait # Block, harvest, post-process.
./launch.sh post-process # Run post-process agent.Credentials stay as env vars (not in shell history).
| Variable | Default | Description |
|---|---|---|
ANTHROPIC_API_KEY |
API key (or use CLAUDE_CODE_OAUTH_TOKEN). |
|
CLAUDE_CODE_OAUTH_TOKEN |
OAuth token via claude setup-token. |
|
OPENAI_API_KEY |
OpenAI API key (for Codex CLI driver). | |
CODEX_AUTH_JSON |
~/.codex/auth.json |
Path to Codex auth file (ChatGPT subscription). |
GEMINI_API_KEY |
Google API key (for Gemini CLI driver). | |
SWARM_CONFIG |
Path to swarmfile (or place swarm.json in repo root). |
|
SWARM_TITLE |
Dashboard title override. | |
SWARM_SKIP_DEP_CHECK |
Set to 1 to silence dependency version warnings. |
Per-group credentials (api_key, auth_token, base_url)
are set in the swarmfile. Use $VAR references to pull
values from the host environment without hardcoding secrets.
Per-group fields in swarm.json agents array:
| Field | Values | Notes |
|---|---|---|
model |
model name | Required. |
count |
integer | Number of agents in this group. |
effort |
string | Reasoning depth (see below). |
context |
full, slim, none |
How much of .claude/ to keep (default: full). |
prompt |
file path | Per-group prompt override (default: top-level). |
auth |
apikey, oauth, chatgpt, omit |
Which host credential to inject (see Auth modes). |
api_key |
key or $VAR |
Per-group API key for third-party endpoints. |
auth_token |
key or $VAR |
Per-group Bearer token (OpenRouter-style). |
base_url |
URL | Per-group API endpoint. |
tag |
string or $VAR |
Label for grouping runs (default: top-level). |
driver |
driver name | Agent driver override (default: top-level or claude-code). |
Effort values are driver-dependent:
- Claude Code:
low,medium,high,max(Opus only). - Codex CLI:
none,minimal,low,medium,high,xhigh. - Gemini CLI: ignored.
Top-level fields: prompt, setup, max_idle (default: 3),
max_retry_wait, driver, inject_git_rules,
git_user (name, email, signing_key),
claude_code_version, title, tag, pricing,
docker_args, post_process.
Set max_retry_wait (seconds) to have agents retry with
exponential backoff when rate-limited instead of exiting:
{ "max_retry_wait": 25200 }Default is 0 (no retry -- exit immediately on fatal errors).
The backoff starts at 30 s, doubles each attempt, and caps at
30 min per sleep. When the cumulative wait exceeds
max_retry_wait, the agent exits. This also covers transient
network failures.
Pass arbitrary flags to every docker run invocation via the
top-level docker_args array. Each element is one shell token:
{
"docker_args": [
"-v", "/var/run/docker.sock:/var/run/docker.sock",
"--privileged"
]
}This is useful for mounting the host Docker socket, adding devices or capabilities, setting network modes, or passing any other flags that the harness does not manage natively.
Set git_user.signing_key to an SSH private-key path on the
host to sign every commit agents and post-processors make.
Accepts a literal path, a bare $VAR reference (expanded from
the host environment), or a path starting with ~/ (expanded
to $HOME before mounting):
{
"git_user": {
"name": "swarm-agent",
"email": "agent@swarm.local",
"signing_key": "~/.ssh/swarm-agent-signing"
}
}The key is bind-mounted read-only into each container at
/etc/swarm/signing_key, and git inside the container is
configured with:
gpg.format = ssh
user.signingkey = /etc/swarm/signing_key
commit.gpgsign = true
When signing_key is absent -- or resolves to empty via an
unset $VAR -- signing is explicitly disabled inside the
container (commit.gpgsign = false), overriding anything that
might otherwise leak in from the image or a mounted config.
The host key file must exist at launch.sh start time;
otherwise launch fails with ERROR: signing key not found.
The container image ships openssh-client for the
ssh-keygen -Y sign that git invokes.
./dashboard.shPer-agent model, auth source, status, cost, tokens, cache, turns, throughput, and duration. Updates every 3s. The header shows a compact model summary on a single line.
| Key | Action |
|---|---|
q |
Quit. |
1-9 |
Logs for agent N. |
h |
Harvest results. |
s |
Stop numbered agents (not post-process). |
p |
Post-process. |
Agent activity streams to Docker logs in real time. Press
[1-9] in the dashboard (or ./launch.sh logs N) to see
what an agent is doing:
12:34:56 harness[1] session start at=abc123
12:35:01 agent[1] Read src/main.ts
12:35:03 agent[1] Edit src/main.ts
12:35:08 agent[1] Shell: npm test
12:35:12 agent[1] Shell: git add -A && git commit -m "fix tests"
12:35:15 agent[1] Shell: git push origin agent-work
12:35:18 harness[1] session end cost=$0.12 in=800 out=644 turns=6 time=19s
The filter (lib/activity-filter.sh) parses stream-json
events from the agent CLI and prints one line per tool call
or thinking block. The timestamp and agent ID are colored
in ANSI yellow (matching git's commit-hash color) for
readability.
Thinking/reasoning content appears as Think: <summary>
when the model produces it. Whether thinking is emitted
depends on the model and configuration: Claude Code requires
extended thinking to be enabled, and Gemini CLI emits thought
events only for models that support them.
On Opus 4.7 and later the Anthropic API default for
thinking.display is "omitted": the thinking field is
empty and the full reasoning is returned encrypted in the
signature field. To restore summaries the client has to
explicitly send thinking: {"display": "summarized"} on
each Messages API request.
The claude-code driver writes "showThinkingSummaries": true
into the workspace's .claude/settings.local.json as a
forward-compatible opt-in. As of Claude Code 2.1.111 the
CLI does not yet plumb that setting through to headless
(-p --output-format stream-json) requests for Opus 4.7, so
on today's releases this opt-in is effectively a no-op for
our pipeline. The setting is retained so that a future
Claude Code release which wires it to the Messages API will
restore summaries automatically with no further swarm change.
While the client-side opt-in is missing, the activity filter classifies the otherwise-blank blocks to keep the dashboard informative:
Think: [encrypted]—thinkingempty,signaturepresent. This is the expected Opus 4.7display:"omitted"payload; the full reasoning exists server-side but is unavailable to the client.Think: [empty]—thinkingempty andsignatureempty. Anomalous: neither summary nor encrypted reasoning; useful diagnostic that something upstream is off.
Blank Think: lines no longer reach the dashboard. On
Opus 4.6 and earlier, summaries were the default and continue
to render as Think: <summary> unchanged.
./tests/test.sh --help # All options.
./tests/test.sh --unit # Unit tests only.
./tests/test.sh # Single smoke test.
./tests/test.sh --all # Full matrix.
./tests/test.sh --config swarm.json # Custom config.
./tests/test.sh --no-inject # Explicit git prompt.
./tests/test.sh --oauth # OAuth-only smoke test.Flags combine: ./tests/test.sh --config f.json --no-inject.
The test harness uses its own built-in prompt (counting + reasoning) regardless of config. The reasoning step exercises adaptive thinking at different effort levels.
Unit tests (no Docker or API key):
./tests/test.sh --unit # All unit tests.
./tests/test_activity_filter.sh # Activity stream parsing.
./tests/test_config.sh # Config parsing.
./tests/test_costs.sh # Cost aggregation.
./tests/test_dashboard.sh # Dashboard rendering.
./tests/test_drivers.sh # Agent driver interface.
./tests/test_format.sh # Formatting helpers.
./tests/test_harness.sh # Stat extraction.
./tests/test_harvest.sh # Harvest git ops.
./tests/test_launch.sh # Launch logic.Add to swarm.json:
{
"post_process": {
"prompt": "prompts/review.md",
"model": "claude-opus-4-6",
"effort": "low",
"max_idle": 2
}
}Trigger via [p] in the dashboard, ./launch.sh post-process,
or automatically via ./launch.sh wait.
The post-process agent clones the same bare repo, sees all
commits on agent-work, runs its prompt, and pushes.
post_process also accepts base_url, api_key,
auth_token, auth, tag, driver, and max_idle -- same
fields as per-group agents -- to route post-processing through
a different provider or credential. max_idle controls how
many consecutive sessions with no commits before the
post-processor exits. When omitted it inherits the top-level
max_idle (default: 3).
Motivated by Evaluating AGENTS.md (Gloaguen et al.), which found that repository-level context files can reduce agent success rates while increasing inference cost by over 20%. This feature enables A/B comparisons within a single swarm.
Control how much of .claude/ each agent group sees:
| Mode | Behavior |
|---|---|
full |
Keep .claude/ as-is (default). |
slim |
Keep only .claude/CLAUDE.md, strip agents/skills. |
none |
Remove entire .claude/ directory (bare agent). |
Set per group in swarm.json:
{
"agents": [
{ "count": 2, "model": "claude-opus-4-6" },
{ "count": 1, "model": "claude-opus-4-6", "context": "none" }
]
}Bare agents do exploratory work unconstrained by repo context
while other agents use skills and rules for structured output.
Non-default modes appear in the dashboard Ctx column and in
commit trailers (> Ctx: bare, > Ctx: slim).
Each agent group can run a different prompt file:
{
"prompt": "tasks/hunt.md",
"agents": [
{ "count": 2, "model": "claude-opus-4-6" },
{ "count": 1, "model": "claude-sonnet-4-6",
"prompt": "tasks/review.md" }
]
}Groups without prompt inherit the top-level value. When every
group specifies its own prompt, the top-level prompt can be
omitted entirely:
{
"agents": [
{ "count": 2, "model": "claude-opus-4-6",
"prompt": "tasks/hunt.md" },
{ "count": 1, "model": "claude-sonnet-4-6",
"prompt": "tasks/review.md" }
]
}Combined with context modes, this enables divergent exploration: hunting agents run one prompt with full skills, a reconciliation agent runs a different prompt to validate and normalize findings.
Three credential mechanisms serve different purposes:
-
auth— Controls which host credential is forwarded to the container. Values:apikey,oauth,chatgpt, or omit (auto-detect). -
api_key— Per-group API key for third-party endpoints (MiniMax, etc.). Passed asANTHROPIC_API_KEYinside the container. Supports$VARreferences to host env vars. -
auth_token— Per-group Bearer token for endpoints that useANTHROPIC_AUTH_TOKEN(OpenRouter-style). ClearsANTHROPIC_API_KEYso Claude Code enters third-party mode. Supports$VARreferences.
auth value |
Credential injected |
|---|---|
apikey |
ANTHROPIC_API_KEY only |
oauth |
CLAUDE_CODE_OAUTH_TOKEN only |
| omit | Both (CLI decides) |
For subscription auth (Pro/Max/Teams/Enterprise), generate
an OAuth token with claude setup-token and export
CLAUDE_CODE_OAUTH_TOKEN.
auth value |
Credential injected |
|---|---|
apikey |
OPENAI_API_KEY only |
chatgpt |
Mounts ~/.codex/auth.json (ChatGPT subscription) |
| omit | API key if set + auth.json if found |
For ChatGPT subscription auth (Plus/Pro/Team/Enterprise),
run codex login on the host to create ~/.codex/auth.json,
then set "auth": "chatgpt" in your swarm config:
{
"driver": "codex-cli",
"agents": [{ "model": "gpt-5.4", "auth": "chatgpt" }]
}The auth file is bind-mounted read-only into containers.
Override the path with CODEX_AUTH_JSON=/path/to/auth.json.
Groups with api_key or auth_token ignore the auth
field; their custom credential is always used. When neither
is set, auth determines which host credential to inject.
The dashboard Auth column reflects the actual credential
source: key, oauth, chatgpt, token, or auto (see
Dashboard columns).
Agents receive git rules (commit/push/rebase) via a system prompt appendix. Your task prompt only needs to describe the work.
Disable with "inject_git_rules": false in the swarmfile.
./costs.sh # Table.
./costs.sh --json # JSON.Stats collected per session inside each container
(agent_logs/stats_agent_*.tsv), read on demand.
Dashboard columns:
- Auth — credential source:
key(API key),oauth(Claude subscription token),chatgpt(ChatGPT subscription),token(Bearer / OpenRouter-style),auto(multiple credentials present, CLI decides). - Ctx — context mode:
bare(no.claude/),slim(onlyCLAUDE.md), or blank for full context. - Cost — cumulative API cost in USD.
- In/Out — input and output tokens.
- Cache — prompt cache read tokens. Higher means the API is reusing cached context instead of reprocessing it, reducing cost and latency. Cache creation tokens (the one-time cost of populating the cache) are recorded in the TSV but not shown separately.
- Turns — number of assistant turns across all sessions.
- Tok/s — output tokens per second of API time.
- Time — cumulative wall-clock duration.
Agent drivers decouple the harness from any specific CLI tool.
Each driver (lib/drivers/<name>.sh) implements a fixed role
interface so the harness can run, monitor, and parse stats from
any supported agent.
Built-in drivers:
| Driver | CLI | Default |
|---|---|---|
claude-code |
claude |
Yes |
gemini-cli |
gemini |
|
codex-cli |
codex |
|
fake |
(none) | Test double for unit testing |
Set the driver globally in swarm.json:
{ "driver": "claude-code" }Or per agent group:
{
"agents": [
{ "count": 2, "model": "claude-opus-4-6" },
{ "count": 1, "model": "other-model", "driver": "other-driver" }
]
}Per-agent drivers inherit the top-level driver field, which
defaults to claude-code.
By default the Docker image installs the latest Claude Code CLI.
To pin a specific version, set claude_code_version in the
swarmfile:
{ "claude_code_version": "1.0.30" }Create lib/drivers/<name>.sh implementing these functions:
agent_default_model() # Fallback model when none configured
agent_name() # Human-readable name for commit trailers
agent_cmd() # CLI command name
agent_version() # Print version string to stdout
agent_run() # Run one session (model, prompt, logfile, append_file)
agent_settings() # Write agent config files into workspace
agent_extract_stats() # Parse stats from log file (TSV output)
agent_detect_fatal() # Detect fatal errors from log + exit code
agent_is_retriable() # Detect retriable errors (rate limits, overload)
agent_activity_jq() # Return jq filter for activity streaming
agent_docker_env() # Print -e flags for agent-specific env vars
agent_docker_auth() # Resolve credentials, emit Docker -e flags
agent_install_cmd() # Print install commands (documentation only)The Dockerfile hardcodes install steps for built-in drivers. New drivers require corresponding Dockerfile changes.
See lib/drivers/claude-code.sh for the reference implementation
and lib/drivers/fake.sh for a minimal test double.
Use the fake driver to validate setup scripts, prompt paths, and
config without spending tokens or requiring API keys. Create a
swarmfile that sets "driver": "fake":
{
"prompt": "your-prompt.md",
"setup": "your-setup.sh",
"driver": "fake",
"agents": [
{ "count": 1, "model": "fake" }
]
}Then run it:
SWARM_CONFIG=dry-run.json ./launch.sh start --dashboardThe fake driver runs the full harness loop — cloning, setup script execution, git hooks — but replaces the agent session with a synthetic JSONL stream that completes instantly. This catches path errors, missing dependencies, and config issues before any real agent run.
Clean up afterwards:
PROJECT=$(basename $(pwd))
docker rm -f ${PROJECT}-agent-1 2>/dev/null
rm -rf /tmp/${PROJECT}-upstream.gitAfter a swarm run, the following artifacts remain on disk:
| Artifact | Path |
|---|---|
| Bare repo | /tmp/<project>-upstream.git |
| Submodule mirrors | /tmp/<project>-mirror-*.git |
| Agent containers | <project>-agent-N |
| State file | /tmp/<project>-swarm.env |
| Remove everything for a fresh start: |
PROJECT=$(basename $(pwd))
docker rm -f $(docker ps -aq --filter "name=${PROJECT}-agent-") 2>/dev/null
rm -rf /tmp/${PROJECT}-upstream.git /tmp/${PROJECT}-mirror-*.git
rm -f /tmp/${PROJECT}-swarm.envdocker run --rm --entrypoint bash \
-e "ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY" \
$(basename $(pwd))-agent \
-c 'claude --dangerously-skip-permissions \
-p "What model are you? Reply with model id only." \
--model claude-opus-4-6 2>&1'