Make LLMs from different vendors collaborate as one team — by capability, not by name.
Phase 5a — Read-only web dashboard, 4-vendor task graph. Screenshot above is the lead-mode dashboard at the end of a 7-task fan-out + fan-in plan running across four different LLM vendors simultaneously: claude-sonnet-4-6 (Anthropic Max subscription via
claude-cli), gpt-oss:120b (Ollama Cloud), gemini-2.5-flash (Google AI Studio API), and gpt-5.5 (ChatGPT subscription via Codex CLI). Each box shows the worker, the model, and the wall-clock duration; the same prompt was sent to all four vendor branches so the lead could compare their outputs side-by-side.Top panel: 6 online agents (5 workers + lead). Bottom panel: the depends_on task graph — sentinel "go" task at the top, four vendor plans in parallel below it, gpt-oss/sonnet/gemini/codex all green, comparison task at the bottom synthesising them. The dashboard polls broker state every 1 s; live status lifecycle is gray (pending/blocked) → amber (in_progress) → green (completed) / red (failed). Per-batch grouping via depends_on connected components means each
/lead-modedispatch is its own clean graph — previous runs auto-clear.
🎬 Screencast of the same lifecycle in motion (sentinel release → vendors light up amber → duration ticking → green): issue #1 or directly:
Demo_v1.mov
Code:
src/mab/broker/routes/dashboard.py(backend snapshot endpoint) +src/mab/broker/static/index.html(single-file vanilla HTML/JS/CSS frontend, no build step). Cross-vendor walkthrough:docs/cookbook.mdRecipe 8.
A single-vendor agent stack (Claude Code subagents, OpenAI Assistants, Gemini agents) can already coordinate N copies of its model. mab-broker is for the harder problem: the right LLM for this sub-task lives in another vendor's stack, on your own hardware, or split across both. Examples this codebase exists to enable:
- Claude Opus plans a workflow, dispatches the heavy reasoning step to gpt-oss:120b on Ollama Cloud (1/10 the cost), then routes the polishing pass to Claude Sonnet via subscription — all in one autonomous task chain.
- Local Llama-70b on your GPU box handles PII-bearing input (no cloud), the same lead also dispatches non-sensitive sub-tasks to GPT-4o, both happen in the same plan.
- Anthropic gets rate-limited mid-workflow → broker reroutes the next task to DeepSeek-R1 automatically, because capability is the contract and vendor is interchangeable.
Tasks declare what they need (required_all=["tier:reasoning", "host:cloud"]), agents declare what they offer (["model:gpt-oss:120b-cloud", "family:gpt-oss", "tier:reasoning", "host:cloud", "provider:ollama"]), and the broker does the matching. You can swap a worker's underlying LLM tomorrow without changing a single lead-side prompt.
- Single-vendor setups — you already have great native options. Use those.
- Public-internet multi-tenant — TLS / wss / JWT / IP allowlist still pending (Phase 4). Today this is LAN / VPN / Tailscale.
- Synchronous chat — broker is task-shaped, not turn-based dialogue.
┌─────────────────────────────────────────────────────────────┐
│ Lead: Claude Opus (Anthropic API) │ ← your laptop
│ "Plan, dispatch, monitor, synthesize" │
└──────────────────────┬──────────────────────────────────────┘
create_task(required_all=[...])
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Broker (FastAPI + SQLite + WS) │ ← your network
│ Capability-filtered push · atomic claim · data sovereign │
└──┬────────────────────────────────────────┬─────────────────┘
│ tier:reasoning + host:cloud │ tier:sonnet + family:claude
▼ ▼
┌──────────────────────┐ ┌──────────────────────┐
│ Worker A │ │ Worker B │
│ adapter: ollama │ │ adapter: claude-cli │
│ model: gpt-oss:120b │ │ model: claude-sonnet│
│ via Ollama Cloud │ │ via Max subscription│
│ (OpenAI OSS family) │ │ (Anthropic family) │
└──────────────────────┘ └──────────────────────┘
+ add a worker for:
– OpenAI direct (API key)
– Gemini direct (API key)
– Local Llama / Qwen / DeepSeek (GPU box, no internet)
– any HTTP-speaking LLM (one ~50-line adapter)
One Python daemon (mab-worker) per worker host, five built-in adapters (anthropic / ollama / claude-cli / gemini / mock), swap with one CLI flag — --adapter X --model Y. No client-side change for the lead when you swap vendors.
Status: Phase 1 + 1.5 + 2.1 + 2.2 + 3a + Phase 3 + Phase 5a fully complete. 212 tests green.
3 vendor families live in production: Anthropic Claude (Opus lead + Sonnet worker via Max subscription), OpenAI OSS (gpt-oss:120b via Ollama Cloud), Google (gemini-2.5-flash via AI Studio on a Raspberry Pi) — all routing via capability tags, lead doesn't see vendor names. Cross-vendor
/lead-modedemos verified end-to-end including a 3-vendor Tokyo itinerary comparison (cookbook Recipe 8) and a playable single-file Chrome-dino game built across two vendors (Recipe 6,examples/dino.html).Phase 5a — Read-only web dashboard ships at
/dashboardon the broker: per-batch task graph (depends_on connected components), live agent state, per-task agent + model + duration metadata, dark "tech" aesthetic with status-lifecycle colours (gray → amber → green/red). Includes the in-place retry primitive — failed tasks reset on the same id with cascade reset of downstream failures, dashboard shows a↻Nbadge and pulse animation for visible retry recovery. See issue #1 for screencast.Reliability hardening: supersede-race fix (zombie WS connections no longer flap status), broker-side stuck-task reaper (worker dies mid-task → auto-fail after heartbeat goes stale).
Cookbook: 8 production-verified recipes in
docs/cookbook.md— two-stage thinking, fan-in synthesis, failure cascade, shared context, channels,/lead-modeend-to-end, adding a third vendor, cross-vendor comparison + retry.
Three runtime roles, one repo, one broker URL:
- Central broker (
mab-broker) — FastAPI + SQLite (WAL) + WebSocket Hub; one binary, zero external dependencies - Interactive MCP agent (
mab-agent) — plugs into Claude Code via.mcp.jsonorclaude mcp add; 14 tools cover roster discovery (list_agents/match_agents), self-declaration (update_my_model/update_my_capabilities), messaging, full task lifecycle (includingdelete_task), and push-drivenwait_for_task. Best for the lead orchestrator and any human-in-the-loop dev work - Autonomous worker daemon (
mab-worker) — headless Python daemon supervised bysystemd --user; pulls tasks via push-drivenwait_for_task, dispatches each through a pluggable LLM adapter, reports back. Four adapters ship in-box:anthropic(api.anthropic.com direct via httpx),ollama(works for local Ollama and Ollama Cloud, also via httpx),claude-cli(spawnsclaude -pper task — gives the worker access to Claude Code's full tool ecosystem), andmock(in-process echo for tests). Best for production workers — daemon mode avoids the Claude Code TUI's user-input-breaks-the-loop race and is supervised, immortal, and observable via journalctl
Plus the cross-cutting machinery:
- Capability-based routing — tasks carry
required_all/required_anytag sets; broker filters WS broadcast to matching online agents and rejects mismatched claims (403) or directed assignments (400) - Model-aware agents —
--model claude-opus-4-7auto-derivesmodel:/family:/tier:/provider:tags so capabilities track the runtime model identity (works on bothmab-agentandmab-worker) - Push-aware tool responses — every MCP tool reply embeds
_pending_messages/_pending_task_eventscounts so Claude Code (which cannot be push-interrupted) is nudged to drain its queue on the next tool call - Atomic task claim —
UPDATE ... WHERE status='pending' RETURNINGguarantees single-winner semantics under concurrent claims - Task delete with
task_event:deleted— creator or current assignee can drop a task in any state; broker auto-clears the assignee'scurrent_taskand broadcasts the deletion - Offline message backfill — messages addressed to an offline agent persist for 7 days; the agent receives them on next reconnect
- Auto-reconnecting WS client — exponential backoff (1→2→4…60s), application-layer text heartbeat keeps
last_heartbeatfresh on the broker - Observability tool —
tools/watch.pyconnects to the broker as a non-MCP agent and prints every received WS event to stdout; the easiest way to verify filter broadcast or chase routing bugs from a third machine - One-shot deployment —
deploy/install.sh(broker),deploy/setup-agent.sh(Claude Code MCP wiring),deploy/setup-worker.sh(daemon) — each is idempotent;deploy/update.shdoes git pull + uv sync + service restart in one go
git clone https://github.com/marsyanggo/multi-agent-broker
cd multi-agent-broker
uv sync
uv run mab-broker serve # listens on 0.0.0.0:8420For a production-style install with systemd user services (Linux), use the one-shot installers:
# Broker host
./deploy/install.sh # one sudo for enable-linger; rest non-sudo
# Interactive Claude Code agent (any host — lead, dev, debug)
./deploy/setup-agent.sh \
--broker-url http://broker-host:8420 \
--api-key mab-ak-XXXX \
--model claude-opus-4-7
# Headless worker daemon (any host — production task processor)
./deploy/setup-worker.sh \
--broker-url http://broker-host:8420 \
--api-key mab-ak-YYYY \
--adapter ollama \
--model gpt-oss:120b-cloud \
--ollama-base-url http://localhost:11434 # via local Ollama proxy
# Update + uninstall
./deploy/update.sh # any host: pull + sync + restart + verify
./deploy/uninstall.sh # broker host: remove unit, keep DBupdate.sh refuses to run on a dirty tree and uses git pull --ff-only, so it never overwrites local commits — safe to run on production hosts unattended. setup-agent.sh validates broker reachability + API key before writing MCP config, and creates a backup of ~/.claude.json on the fallback path. setup-worker.sh writes a chmod 600 systemd unit with all config (including secrets) as Environment= lines — --name <suffix> lets one host run several workers in parallel (mab-worker-opus.service + mab-worker-llama.service, etc.). See deploy/README.md for the full flow.
uv run mab-broker gen-key --name claude-laptop
# Registered agent: claude-laptop (id=abc12345)
# API key (save now — it cannot be recovered):
# mab-ak-XXXXXXXXXXXXXXXXYou can pre-seed capabilities at registration:
uv run mab-broker gen-key --name worker-linux \
--capabilities tier:sonnet,family:claude,visionName conflict? Add --auto-suffix to auto-increment (claude-laptop → claude-laptop-2).
When the broker is installed via
deploy/install.sh, its data dir is moved off$HOME. Re-export the sameMAB_DB_PATHwhen runninggen-keyfrom the same machine, or you'll write keys to a different SQLite file than the broker reads:MAB_DB_PATH=$XDG_DATA_HOME/multi-agent-broker/db.sqlite \ uv run mab-broker gen-key --name claude-laptop
claude mcp add -s user mab \
/path/to/mab-agent \
--broker-url http://192.168.1.100:8420 \
--api-key mab-ak-XXXXXXXXXXXXXXXX \
--model claude-opus-4-7--model claude-opus-4-7 auto-derives capability tags. To override or extend:
mab-agent --broker-url ... --api-key ... \
--model claude-opus-4-7 \
--capabilities vision,code-review # extra tags merged with model derivationRestart Claude Code. You'll see 14 tools:
- Roster —
list_agents,match_agents,get_agent_info - Self —
report_status,update_my_model,update_my_capabilities - Messaging —
send_message,get_messages - Tasks —
create_task,claim_task,update_task,delete_task,list_tasks - Worker —
wait_for_task(push-driven block on broker WS, used by /worker-mode)
For headless task execution without a Claude Code session in the loop, install mab-worker as a systemd user service:
# On the worker host (broker generated the key, then handed it to you)
./deploy/setup-worker.sh \
--broker-url http://broker-host:8420 \
--api-key mab-ak-YYYYYYYYYYYYYY \
--adapter ollama \
--model gpt-oss:120b-cloud \
--ollama-base-url http://localhost:11434 # local Ollama proxies to Ollama CloudAdapter choices (--adapter):
| Adapter | Backend | Best for |
|---|---|---|
anthropic |
POST to api.anthropic.com/v1/messages direct via httpx |
Pure-LLM tasks against the official Claude API |
ollama |
POST to /api/chat direct via httpx; auto-handles local + Ollama Cloud auth |
Local GPU models or Ollama Cloud-routed models like gpt-oss:120b-cloud |
claude-cli |
Spawns claude -p <prompt> per task |
Tasks needing Claude Code's tool ecosystem (Bash / Edit / Read / web). Heaviest cold start but most capable |
mock |
In-process echo | Smoke tests, dry runs |
After install, the daemon is supervised:
systemctl --user status mab-worker.service # is it up?
journalctl --user -u mab-worker.service -f # tail logs
systemctl --user restart mab-worker.service # apply config / rotate key
./deploy/update.sh && systemctl --user restart mab-worker.service # pull repo + restartRun multiple workers on one host with --name <suffix> — e.g. --name opus-cloud creates mab-worker-opus-cloud.service alongside the default mab-worker.service, each with its own unit file, secrets, and journal.
You: Use the mab tools to create a task that needs tier:opus, then claim it.
Claude: [calls create_task(title="...", required_all=["tier:opus"])]
[calls claim_task(task_id="...")] # succeeds because this agent has tier:opus
[calls update_task(status="completed", result="...")]
If a non-opus agent tries to claim the same task, the broker returns 403 and the task stays pending.
The repo ships two Claude Code skills (under .claude/skills/) that turn any Claude session into either an orchestrator or a worker with one slash command:
| Command | Role | What it does |
|---|---|---|
/lead-mode |
Planner / dispatcher | Scouts the roster, decomposes user goals into sub-tasks, picks best-fit agents by capability, monitors progress, synthesizes results |
/worker-mode |
Interactive worker | Blocks on wait_for_task (push-driven, sub-second latency); claims new work the instant the broker pushes it, executes per task description, reports completed or failed, loops. For production workers, prefer mab-worker daemon — /worker-mode is fine for dev / debug, but Claude Code's interactive TUI means a stray user keystroke breaks the loop. The daemon has no TUI and is supervised by systemd. |
Typical multi-host setup:
- Lead host (laptop) —
/lead-modeonce in Claude Code, then talk to it like a project manager - Production worker hosts (GPU / cloud-LLM boxes) —
setup-worker.shonce, daemon runs forever in background
The skills are pure prompt + existing MCP tools — no daemon process, no new Python. Worker mode runs push-driven via the wait_for_task MCP tool (blocks on mab-agent's WS event queue, returns within ~100ms of broker push); no polling, sub-second routing latency. The standalone mab-worker daemon uses the same primitive but bypasses Claude Code entirely — see deploy/README.md.
See the skill files themselves for the full behaviour spec.
If you want to see which events reach a given agent without bolting it into Claude Code, run tools/watch.py on the same host as that agent's API key:
python tools/watch.py \
--broker-url http://192.168.1.100:8420 \
--api-key mab-ak-XXXX
# stderr: [watch] connected as worker-linux (id=...) caps=[...]
# stdout: one JSON line per received message / task_eventThen create tasks from elsewhere with different required_all sets — non-matching events simply never appear in the watcher's output, which is the cleanest live proof that filter broadcast works. (Used to validate this codebase across a real Linux + macOS setup.)
┌────────────────────────────────────────────┐
│ Central Broker (FastAPI + SQLite + WS Hub) │
│ REST /api/v1/{agents|messages|tasks} │
│ WS /api/v1/ws (push channel) │
│ Capability matcher filters task broadcast │
└──┬─────────────────────┬──────────────────┘
│ │
WS push + REST WS push + REST
│ │
┌────────┴────────┐ ┌───────┴───────────┐
│ mab-agent │ │ mab-worker │
│ (MCP stdio) │ │ (systemd daemon) │
│ │ │ │
│ Claude Code │ │ Adapter: │
│ /lead-mode │ │ anthropic │
│ /worker-mode │ │ ollama │
│ (interactive) │ │ claude-cli │
└─────────────────┘ └───────────────────┘
lead / dev box production worker
Roles compose. A small setup has the broker, one lead mab-agent on the user's laptop, and one mab-worker daemon on each GPU / model host. The broker is the only piece that needs to be reachable from every other; agents and workers are clients.
Claude Code → mab-agent → REST POST /api/v1/tasks (required_all=[tier:opus])
│
broker stores task, computes online agents matching capability
│
WS push task_event:created → ONLY agents with tier:opus
│
One of them: POST /api/v1/tasks/{id}/claim
│
Broker validates claimer.capabilities, then atomic UPDATE
│
WS push task_event:claimed → creator + assignee
│
Assignee: PATCH /api/v1/tasks/{id} { status: completed, result, note }
│
WS push task_event:completed → creator + assignee
Full Phase 1-5 long-form design lives in architcture.md.
Tags follow a prefix:value convention (with bare tags also allowed). The matcher does plain string equality on tags; no glob / regex.
| Prefix | Meaning | Examples |
|---|---|---|
model: |
Exact model identifier | model:claude-opus-4-7, model:gpt-4o, model:llama-3.3-70b |
family: |
Vendor / brand family | family:claude, family:openai, family:google, family:meta |
tier: |
Capability tier within a family | tier:opus, tier:sonnet, tier:haiku, tier:flash, tier:pro |
provider: |
API provider | provider:anthropic, provider:openai, provider:google |
| (bare) | Free-form capability flag | vision, audio, code-review, cn-locale |
mab-agent --model X auto-derives model:, family:, tier:, provider: for known model families. Coverage:
- Closed-API: Claude (opus / sonnet / haiku), GPT (4 / 4o / 5), Gemini (pro / flash / flash-lite) — full
family:+tier:+provider: - Open-source (Anthropic / Meta / Alibaba / DeepSeek / Microsoft / Mistral / OpenAI OSS / Google Gemma):
gpt-oss,llama / llama-3 / llama-3.3 / llama-4,qwen / qwen2.5 / qwen2.5-coder / qwen3,deepseek / deepseek-r1 / deepseek-v3 / deepseek-coder,mistral / mistral-large / mistral-small,gemma / gemma-3,phi / phi-4—family:+tier:(provider intentionally not auto-set since these can be hosted via Ollama, vLLM, Bedrock, etc.) - Ollama tag form
name:tag(e.g.gpt-oss:20b,llama3.3:70b,qwen2.5-coder:32b): auto-addssize:<tag>,host:local, andprovider:ollama. The-cloudsuffix (e.g.gpt-oss:120b-cloud) is recognised as Ollama Cloud:size:120b+host:cloudinstead of polluting the size tag. Pass--capabilitiesto override provider if you're running through vLLM / something else.
Unknown model strings only get model:X (+ size: / provider:ollama if :tag form). Anything richer should be passed via --capabilities.
Examples:
mab-agent --broker-url ... --api-key ... --model gpt-oss:20b
# → [model:gpt-oss:20b, size:20b, host:local, provider:ollama,
# family:gpt-oss, tier:reasoning]
mab-agent --broker-url ... --api-key ... --model gpt-oss:120b-cloud
# → [model:gpt-oss:120b-cloud, size:120b, host:cloud, provider:ollama,
# family:gpt-oss, tier:reasoning]Four ways an agent can claim which model it runs (in order of recommendation):
mab-agent --model Xat startup — derives the standard tags +PATCH /agents/mebefore opening the WebSocket, so the broker has correct caps when the agent comes online. Best for fixed-per-process deployments (Claude Code via MCP).mab-agent --capabilities a,bmerge — combines with--modelfor runtime extras (e.g. addvisionfor a session). Full replacement of stored tags; not append.update_my_model/update_my_capabilitiesMCP tools — the LLM driving the agent can self-update mid-session. Use when the model switches at runtime (e.g. Claude Code/fasttoggle) or to add a per-task skill flag.mab-broker gen-key --capabilities ...— persist a default in the broker DB at registration time. Survives across restarts but won't reflect runtime model changes.
In practice: gen-key for the immutable identity, --model flag for the per-process runtime claim, MCP tools for in-session corrections.
Any process speaking REST + WS can join the broker. Reuse BrokerClient directly:
from mab.mcp_server.broker_client import BrokerClient
from mab.shared.capabilities import derive_capabilities_from_model
client = BrokerClient(
broker_url="http://192.168.1.100:8420",
api_key="mab-ak-XXXX",
)
# Declare before going online so task routing sees fresh caps.
await client.update_capabilities(
derive_capabilities_from_model("llama-3.3-70b") + ["gpu-local"]
)
await client.start() # WS connect + auto-reconnect + heartbeat
# Now drain events as they arrive:
while ...:
for msg in client.drain_messages():
...
for task in client.drain_task_events():
...tools/watch.py is a complete example of this pattern. A Phase 3 SDK will package it into a proper public API; the helpers above already work today.
A task carries two optional tag lists, both AND-of-AND-then-AND-of-OR:
create_task(
title="...",
required_all=["tier:opus"], # claimer must have ALL of these
required_any=["vision", "audio"], # claimer must have AT LEAST ONE of these
)| Agent tags | required_all=[tier:opus], required_any=[vision, audio] |
|---|---|
[tier:opus, vision] |
✓ |
[tier:opus, audio] |
✓ |
[tier:opus] |
✗ (fails required_any) |
[tier:sonnet, vision] |
✗ (fails required_all) |
[] (no caps) |
✗ unless both fields are empty |
Empty required_all + empty required_any ⇒ any agent matches (Phase 1 backward-compat behaviour).
- Filter broadcast — broker only sends
task_event:createdover WS to matching online agents. Non-matchers never know the task exists via push. - Authoritative claim — even if an agent learns of a task via
list_tasks, claiming it requires capability match; otherwise403. - Directed assignment —
create_task(assigned_to=...)with capability requirements validates the assignee at creation;400if mismatched. - Zero matches — task is still created (status
pending); later-connecting agents that match can pick it up vialist_tasks.
Env vars (prefix MAB_):
| Var | Default | Effect |
|---|---|---|
MAB_HOST |
0.0.0.0 |
Broker bind host |
MAB_PORT |
8420 |
Broker port |
MAB_DB_PATH |
~/.multi-agent-broker/db.sqlite |
SQLite location |
MAB_MESSAGE_TTL_DAYS |
7 |
Undelivered message retention |
MAB_HEARTBEAT_INTERVAL_SECONDS |
30 |
WS keepalive + app heartbeat |
MAB_BROKER_URL |
http://localhost:8420 |
(agent) broker to dial |
MAB_API_KEY |
— required — | (agent) auth token |
MAB_MODEL |
— | (agent) model identifier for capability derivation |
MAB_CAPABILITIES |
— | (agent) extra capability tags, comma-separated |
CLI flags on mab-agent mirror the env vars; CLI takes precedence.
uv run pytest
# 188 tests, ~35s — includes real-subprocess end-to-end demo + live-broker daemon integrationTest layout:
| File | Coverage |
|---|---|
tests/test_db.py |
SQLite CRUD, atomic claim, TTL cleanup, capability migration |
tests/test_auth.py |
API key hashing, Bearer validation, name conflict / auto-suffix |
tests/test_capabilities.py |
Capability matcher (AND-of-all + AND-of-any), model→tag derivation (Claude / GPT / Gemini / Llama / Qwen / DeepSeek / gpt-oss / Phi + Ollama tag heuristic) |
tests/test_routes.py |
REST routes against in-process FastAPI, including capability validation + task delete (creator / assignee / non-owner / 404) |
tests/test_websocket.py |
Hub routing, backfill, agent/task events, capability filter broadcast |
tests/test_broker_client.py |
BrokerClient against a live uvicorn broker (including app heartbeat) |
tests/test_mcp_server.py |
MCP tool wiring (14 tools) + _pending_messages interceptor + update_my_model derivation + wait_for_task push/timeout/filter |
tests/test_e2e_demo.py |
mab-broker serve + 2× mab-agent via MCP stdio (4 demo scenarios) |
tests/worker/test_mock_adapter.py |
MockAdapter contract: fixed / callable / async / error / delay / lifecycle |
tests/worker/test_daemon.py |
WorkerDaemon against live broker: directly-assigned tasks, open-pool claim, adapter error path, per-task timeout, multi-task survival, capability declaration, setup/teardown, stats |
tests/worker/test_anthropic_adapter.py |
AnthropicAdapter against httpx.MockTransport: happy path, system prompt + temperature, multi-block concat, non-text-block skip, HTTP error mapping, empty response, missing key |
tests/worker/test_ollama_adapter.py |
OllamaAdapter against httpx.MockTransport: local no-auth, cloud bearer, system prompt, response trim, HTTP error, malformed response |
tests/worker/test_claude_cli_adapter.py |
ClaudeCLIAdapter with a temp Python shim mimicking claude CLI: version probe, model + flag wiring, prompt template, exit code mapping, empty stdout, subprocess kill on cancel, extra args |
tests/worker/test_cli.py |
mab-worker CLI: parse_capabilities, build_adapter for each of 5 adapters, env-var defaults, required-flag validation |
tests/worker/test_gemini_adapter.py |
GeminiAdapter against httpx.MockTransport: happy path, system prompt, temperature, multi-part text concat, missing key, env key, HTTP error, SAFETY finishReason surfacing, no-candidates |
tests/worker/test_e2e_daemon.py |
End-to-end daemon scenarios: capability-rejected claim, push-after-catchup (true push path), graceful stop mid-task, 10-task burst |
tests/test_dashboard.py |
/api/v1/dashboard/snapshot shape: unauth 401, empty shape, populated state after creating task/channel/context |
tests/test_reaper.py |
Stuck-task reaper: stuck + stale agent → failed + current_task cleared; fresh agent skipped; cascade failure through downstream; ignore pending / completed |
tests/test_retry.py |
In-place retry: reset failed → pending + cleared state + retry note; 400 on non-failed; 404 on missing; counter increment across attempts; reset to blocked if upstream not completed; cascade reset downstream failures |
- Phase 1 ✅ — broker + MCP agent (agent / message / task)
- Phase 1.5 ✅ — one-shot deployment (uv + systemd user service)
- Phase 2.1 ✅ — capability-based task routing
- Phase 2.2 ✅ — task delete +
tools/watch.pyobservability + live multi-agent cross-machine verification - Phase 3a ✅ — Lead Agent enablers: roster (
match_agents+is_stale+current_taskfreshness), capability self-update MCP tools (update_my_model/update_my_capabilities), pre-WS capability declaration, push-drivenwait_for_taskMCP tool - Phase 3 ✅ —
mab-workerdaemon SDK + 4 adapters (Anthropic / Ollama / Claude CLI / Mock),setup-worker.shone-shot install, push-driven event-name-filtered task queue. Production-verified: claude-mac (Opus) → broker → daemon (gpt-oss:120b via Ollama Cloud) end-to-end in ~1s - Phase 3 (D — depends_on) ✅ — task dependencies: blocked status + auto-unblock on upstream completion + failure cascade through downstream chains. Lets a lead fire a whole multi-step plan in one go instead of polling between steps.
- Phase 3 (S — shared context) ✅ — pinned named documents any agent can read. Solves "every task description duplicates the same style guide" and supports auto-promoting upstream task results as named handoff docs.
- Phase 3 (CH — channels) ✅ — named group-broadcast topics. Members get WS push for every new message; non-members can still read history via REST but don't receive pushes and can't post. Distinct from direct messages (1:1) and tasks (claim-lifecycle) — for coordination noise, status updates, open queries. 12 new tests (9 REST + 2 WS + 1 MCP).
- Phase 3 (cookbook) ✅ — 8 production-verified cross-vendor recipes in
docs/cookbook.md: two-stage thinking, fan-in synthesis, failure cascade, pinned project spec, group broadcast,/lead-modeend-to-end (examples/dino.html), adding a third vendor (Google Gemini on RPi), cross-vendor comparison + in-place retry. - Phase 3 (reliability hardening) ✅ — supersede-race fix (two mab-agents sharing an api-key no longer flap status via WS supersede cycle).
- Phase 5 (broker reliability) ✅ — stuck-task reaper: tasks pinned to
assigned/in_progresswhose owner agent goes stale auto-fail with a self-describing note, cascade-fails through downstreamdepends_onchains. Heartbeat-only trigger so long-running LLM calls on healthy workers aren't false-positived. - Phase 5a (read-only web dashboard) ✅ — vanilla HTML+JS+CSS bundled into the broker at
/dashboard(single 28 KB file, no build step). Per-batch task graph viadepends_onconnected-component grouping; live agent state with per-task agent + model + duration metadata; dark / cyan-accent aesthetic; status lifecycle gray → amber → green/red. - Phase 5 (adapter resilience) ✅ —
GeminiAdapter(Google AI Studio, OpenAI-compatible-ish via httpx-direct, 4th built-in adapter).max_output_tokensdefaults raised to 8192 after demo11 silently truncated long-form output. In-place retry primitive:POST /api/v1/tasks/{id}/retryresets a failed task on the same id, cascade-resets downstream failures, dashboard shows↻Nhistory badge + 1s pulse animation while retrying. - Phase 5 (worker ops symmetry) ✅ —
deploy/update-worker.shmirrors the broker'supdate.sh: ff-only pull,uv sync, restart everymab-worker*.service(or filter via--name), per-unit online-status verification via/agents/me. - Test total: 212.
- Phase 4 — TLS + JWT + IP allowlist for public-internet deployment (deferred — LAN deployment works for current use)
- Phase 5b — dashboard write actions (dispatch / claim / delete / send message / retry button on red nodes — retry route already lands)
- Phase 5c — full-text search across messages + task descriptions/results + contexts + channel messages (SQLite FTS5)
