Skip to content

marsyanggo/multi-agent-broker

Repository files navigation

multi-agent-broker

Make LLMs from different vendors collaborate as one team — by capability, not by name.

Demo

mab-broker dashboard — 4-vendor task graph live

Phase 5a — Read-only web dashboard, 4-vendor task graph. Screenshot above is the lead-mode dashboard at the end of a 7-task fan-out + fan-in plan running across four different LLM vendors simultaneously: claude-sonnet-4-6 (Anthropic Max subscription via claude-cli), gpt-oss:120b (Ollama Cloud), gemini-2.5-flash (Google AI Studio API), and gpt-5.5 (ChatGPT subscription via Codex CLI). Each box shows the worker, the model, and the wall-clock duration; the same prompt was sent to all four vendor branches so the lead could compare their outputs side-by-side.

Top panel: 6 online agents (5 workers + lead). Bottom panel: the depends_on task graph — sentinel "go" task at the top, four vendor plans in parallel below it, gpt-oss/sonnet/gemini/codex all green, comparison task at the bottom synthesising them. The dashboard polls broker state every 1 s; live status lifecycle is gray (pending/blocked) → amber (in_progress) → green (completed) / red (failed). Per-batch grouping via depends_on connected components means each /lead-mode dispatch is its own clean graph — previous runs auto-clear.

🎬 Screencast of the same lifecycle in motion (sentinel release → vendors light up amber → duration ticking → green): issue #1 or directly:

Demo_v1.mov

Code: src/mab/broker/routes/dashboard.py (backend snapshot endpoint) + src/mab/broker/static/index.html (single-file vanilla HTML/JS/CSS frontend, no build step). Cross-vendor walkthrough: docs/cookbook.md Recipe 8.

A single-vendor agent stack (Claude Code subagents, OpenAI Assistants, Gemini agents) can already coordinate N copies of its model. mab-broker is for the harder problem: the right LLM for this sub-task lives in another vendor's stack, on your own hardware, or split across both. Examples this codebase exists to enable:

  • Claude Opus plans a workflow, dispatches the heavy reasoning step to gpt-oss:120b on Ollama Cloud (1/10 the cost), then routes the polishing pass to Claude Sonnet via subscription — all in one autonomous task chain.
  • Local Llama-70b on your GPU box handles PII-bearing input (no cloud), the same lead also dispatches non-sensitive sub-tasks to GPT-4o, both happen in the same plan.
  • Anthropic gets rate-limited mid-workflow → broker reroutes the next task to DeepSeek-R1 automatically, because capability is the contract and vendor is interchangeable.

Tasks declare what they need (required_all=["tier:reasoning", "host:cloud"]), agents declare what they offer (["model:gpt-oss:120b-cloud", "family:gpt-oss", "tier:reasoning", "host:cloud", "provider:ollama"]), and the broker does the matching. You can swap a worker's underlying LLM tomorrow without changing a single lead-side prompt.

Where mab-broker isn't a fit

  • Single-vendor setups — you already have great native options. Use those.
  • Public-internet multi-tenant — TLS / wss / JWT / IP allowlist still pending (Phase 4). Today this is LAN / VPN / Tailscale.
  • Synchronous chat — broker is task-shaped, not turn-based dialogue.

What it looks like in practice

┌─────────────────────────────────────────────────────────────┐
│  Lead: Claude Opus (Anthropic API)                          │  ← your laptop
│    "Plan, dispatch, monitor, synthesize"                    │
└──────────────────────┬──────────────────────────────────────┘
              create_task(required_all=[...])
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│  Broker (FastAPI + SQLite + WS)                             │  ← your network
│    Capability-filtered push · atomic claim · data sovereign │
└──┬────────────────────────────────────────┬─────────────────┘
   │ tier:reasoning + host:cloud            │ tier:sonnet + family:claude
   ▼                                         ▼
┌──────────────────────┐                ┌──────────────────────┐
│  Worker A            │                │  Worker B            │
│  adapter: ollama     │                │  adapter: claude-cli │
│  model: gpt-oss:120b │                │  model: claude-sonnet│
│  via Ollama Cloud    │                │  via Max subscription│
│  (OpenAI OSS family) │                │  (Anthropic family)  │
└──────────────────────┘                └──────────────────────┘

                          + add a worker for:
                            – OpenAI direct (API key)
                            – Gemini direct (API key)
                            – Local Llama / Qwen / DeepSeek (GPU box, no internet)
                            – any HTTP-speaking LLM (one ~50-line adapter)

One Python daemon (mab-worker) per worker host, five built-in adapters (anthropic / ollama / claude-cli / gemini / mock), swap with one CLI flag — --adapter X --model Y. No client-side change for the lead when you swap vendors.

Status: Phase 1 + 1.5 + 2.1 + 2.2 + 3a + Phase 3 + Phase 5a fully complete. 212 tests green.

3 vendor families live in production: Anthropic Claude (Opus lead + Sonnet worker via Max subscription), OpenAI OSS (gpt-oss:120b via Ollama Cloud), Google (gemini-2.5-flash via AI Studio on a Raspberry Pi) — all routing via capability tags, lead doesn't see vendor names. Cross-vendor /lead-mode demos verified end-to-end including a 3-vendor Tokyo itinerary comparison (cookbook Recipe 8) and a playable single-file Chrome-dino game built across two vendors (Recipe 6, examples/dino.html).

Phase 5a — Read-only web dashboard ships at /dashboard on the broker: per-batch task graph (depends_on connected components), live agent state, per-task agent + model + duration metadata, dark "tech" aesthetic with status-lifecycle colours (gray → amber → green/red). Includes the in-place retry primitive — failed tasks reset on the same id with cascade reset of downstream failures, dashboard shows a ↻N badge and pulse animation for visible retry recovery. See issue #1 for screencast.

Reliability hardening: supersede-race fix (zombie WS connections no longer flap status), broker-side stuck-task reaper (worker dies mid-task → auto-fail after heartbeat goes stale).

Cookbook: 8 production-verified recipes in docs/cookbook.md — two-stage thinking, fan-in synthesis, failure cascade, shared context, channels, /lead-mode end-to-end, adding a third vendor, cross-vendor comparison + retry.


What's in the box

Three runtime roles, one repo, one broker URL:

  • Central broker (mab-broker) — FastAPI + SQLite (WAL) + WebSocket Hub; one binary, zero external dependencies
  • Interactive MCP agent (mab-agent) — plugs into Claude Code via .mcp.json or claude mcp add; 14 tools cover roster discovery (list_agents / match_agents), self-declaration (update_my_model / update_my_capabilities), messaging, full task lifecycle (including delete_task), and push-driven wait_for_task. Best for the lead orchestrator and any human-in-the-loop dev work
  • Autonomous worker daemon (mab-worker) — headless Python daemon supervised by systemd --user; pulls tasks via push-driven wait_for_task, dispatches each through a pluggable LLM adapter, reports back. Four adapters ship in-box: anthropic (api.anthropic.com direct via httpx), ollama (works for local Ollama and Ollama Cloud, also via httpx), claude-cli (spawns claude -p per task — gives the worker access to Claude Code's full tool ecosystem), and mock (in-process echo for tests). Best for production workers — daemon mode avoids the Claude Code TUI's user-input-breaks-the-loop race and is supervised, immortal, and observable via journalctl

Plus the cross-cutting machinery:

  • Capability-based routing — tasks carry required_all / required_any tag sets; broker filters WS broadcast to matching online agents and rejects mismatched claims (403) or directed assignments (400)
  • Model-aware agents--model claude-opus-4-7 auto-derives model: / family: / tier: / provider: tags so capabilities track the runtime model identity (works on both mab-agent and mab-worker)
  • Push-aware tool responses — every MCP tool reply embeds _pending_messages / _pending_task_events counts so Claude Code (which cannot be push-interrupted) is nudged to drain its queue on the next tool call
  • Atomic task claimUPDATE ... WHERE status='pending' RETURNING guarantees single-winner semantics under concurrent claims
  • Task delete with task_event:deleted — creator or current assignee can drop a task in any state; broker auto-clears the assignee's current_task and broadcasts the deletion
  • Offline message backfill — messages addressed to an offline agent persist for 7 days; the agent receives them on next reconnect
  • Auto-reconnecting WS client — exponential backoff (1→2→4…60s), application-layer text heartbeat keeps last_heartbeat fresh on the broker
  • Observability tooltools/watch.py connects to the broker as a non-MCP agent and prints every received WS event to stdout; the easiest way to verify filter broadcast or chase routing bugs from a third machine
  • One-shot deploymentdeploy/install.sh (broker), deploy/setup-agent.sh (Claude Code MCP wiring), deploy/setup-worker.sh (daemon) — each is idempotent; deploy/update.sh does git pull + uv sync + service restart in one go

Quickstart

Install (broker host)

git clone https://github.com/marsyanggo/multi-agent-broker
cd multi-agent-broker
uv sync
uv run mab-broker serve         # listens on 0.0.0.0:8420

For a production-style install with systemd user services (Linux), use the one-shot installers:

# Broker host
./deploy/install.sh                                          # one sudo for enable-linger; rest non-sudo

# Interactive Claude Code agent (any host — lead, dev, debug)
./deploy/setup-agent.sh \
    --broker-url http://broker-host:8420 \
    --api-key   mab-ak-XXXX \
    --model     claude-opus-4-7

# Headless worker daemon (any host — production task processor)
./deploy/setup-worker.sh \
    --broker-url http://broker-host:8420 \
    --api-key   mab-ak-YYYY \
    --adapter   ollama \
    --model     gpt-oss:120b-cloud \
    --ollama-base-url http://localhost:11434           # via local Ollama proxy

# Update + uninstall
./deploy/update.sh                                           # any host: pull + sync + restart + verify
./deploy/uninstall.sh                                        # broker host: remove unit, keep DB

update.sh refuses to run on a dirty tree and uses git pull --ff-only, so it never overwrites local commits — safe to run on production hosts unattended. setup-agent.sh validates broker reachability + API key before writing MCP config, and creates a backup of ~/.claude.json on the fallback path. setup-worker.sh writes a chmod 600 systemd unit with all config (including secrets) as Environment= lines — --name <suffix> lets one host run several workers in parallel (mab-worker-opus.service + mab-worker-llama.service, etc.). See deploy/README.md for the full flow.

Generate an API key per agent

uv run mab-broker gen-key --name claude-laptop
# Registered agent: claude-laptop (id=abc12345)
# API key (save now — it cannot be recovered):
#   mab-ak-XXXXXXXXXXXXXXXX

You can pre-seed capabilities at registration:

uv run mab-broker gen-key --name worker-linux \
  --capabilities tier:sonnet,family:claude,vision

Name conflict? Add --auto-suffix to auto-increment (claude-laptopclaude-laptop-2).

When the broker is installed via deploy/install.sh, its data dir is moved off $HOME. Re-export the same MAB_DB_PATH when running gen-key from the same machine, or you'll write keys to a different SQLite file than the broker reads:

MAB_DB_PATH=$XDG_DATA_HOME/multi-agent-broker/db.sqlite \
  uv run mab-broker gen-key --name claude-laptop

Wire up Claude Code

claude mcp add -s user mab \
  /path/to/mab-agent \
  --broker-url http://192.168.1.100:8420 \
  --api-key mab-ak-XXXXXXXXXXXXXXXX \
  --model claude-opus-4-7

--model claude-opus-4-7 auto-derives capability tags. To override or extend:

mab-agent --broker-url ... --api-key ... \
  --model claude-opus-4-7 \
  --capabilities vision,code-review     # extra tags merged with model derivation

Restart Claude Code. You'll see 14 tools:

  • Rosterlist_agents, match_agents, get_agent_info
  • Selfreport_status, update_my_model, update_my_capabilities
  • Messagingsend_message, get_messages
  • Taskscreate_task, claim_task, update_task, delete_task, list_tasks
  • Workerwait_for_task (push-driven block on broker WS, used by /worker-mode)

Install a worker daemon (production)

For headless task execution without a Claude Code session in the loop, install mab-worker as a systemd user service:

# On the worker host (broker generated the key, then handed it to you)
./deploy/setup-worker.sh \
  --broker-url http://broker-host:8420 \
  --api-key   mab-ak-YYYYYYYYYYYYYY \
  --adapter   ollama \
  --model     gpt-oss:120b-cloud \
  --ollama-base-url http://localhost:11434       # local Ollama proxies to Ollama Cloud

Adapter choices (--adapter):

Adapter Backend Best for
anthropic POST to api.anthropic.com/v1/messages direct via httpx Pure-LLM tasks against the official Claude API
ollama POST to /api/chat direct via httpx; auto-handles local + Ollama Cloud auth Local GPU models or Ollama Cloud-routed models like gpt-oss:120b-cloud
claude-cli Spawns claude -p <prompt> per task Tasks needing Claude Code's tool ecosystem (Bash / Edit / Read / web). Heaviest cold start but most capable
mock In-process echo Smoke tests, dry runs

After install, the daemon is supervised:

systemctl --user status mab-worker.service        # is it up?
journalctl --user -u mab-worker.service -f        # tail logs
systemctl --user restart mab-worker.service       # apply config / rotate key
./deploy/update.sh && systemctl --user restart mab-worker.service   # pull repo + restart

Run multiple workers on one host with --name <suffix> — e.g. --name opus-cloud creates mab-worker-opus-cloud.service alongside the default mab-worker.service, each with its own unit file, secrets, and journal.

Try it out

You:    Use the mab tools to create a task that needs tier:opus, then claim it.
Claude: [calls create_task(title="...", required_all=["tier:opus"])]
        [calls claim_task(task_id="...")]   # succeeds because this agent has tier:opus
        [calls update_task(status="completed", result="...")]

If a non-opus agent tries to claim the same task, the broker returns 403 and the task stays pending.

/lead-mode + /worker-mode slash commands

The repo ships two Claude Code skills (under .claude/skills/) that turn any Claude session into either an orchestrator or a worker with one slash command:

Command Role What it does
/lead-mode Planner / dispatcher Scouts the roster, decomposes user goals into sub-tasks, picks best-fit agents by capability, monitors progress, synthesizes results
/worker-mode Interactive worker Blocks on wait_for_task (push-driven, sub-second latency); claims new work the instant the broker pushes it, executes per task description, reports completed or failed, loops. For production workers, prefer mab-worker daemon/worker-mode is fine for dev / debug, but Claude Code's interactive TUI means a stray user keystroke breaks the loop. The daemon has no TUI and is supervised by systemd.

Typical multi-host setup:

  • Lead host (laptop)/lead-mode once in Claude Code, then talk to it like a project manager
  • Production worker hosts (GPU / cloud-LLM boxes)setup-worker.sh once, daemon runs forever in background

The skills are pure prompt + existing MCP tools — no daemon process, no new Python. Worker mode runs push-driven via the wait_for_task MCP tool (blocks on mab-agent's WS event queue, returns within ~100ms of broker push); no polling, sub-second routing latency. The standalone mab-worker daemon uses the same primitive but bypasses Claude Code entirely — see deploy/README.md.

See the skill files themselves for the full behaviour spec.

Observing routing live

If you want to see which events reach a given agent without bolting it into Claude Code, run tools/watch.py on the same host as that agent's API key:

python tools/watch.py \
  --broker-url http://192.168.1.100:8420 \
  --api-key mab-ak-XXXX
# stderr: [watch] connected as worker-linux (id=...) caps=[...]
# stdout: one JSON line per received message / task_event

Then create tasks from elsewhere with different required_all sets — non-matching events simply never appear in the watcher's output, which is the cleanest live proof that filter broadcast works. (Used to validate this codebase across a real Linux + macOS setup.)


Architecture

                  ┌────────────────────────────────────────────┐
                  │ Central Broker (FastAPI + SQLite + WS Hub) │
                  │  REST  /api/v1/{agents|messages|tasks}     │
                  │  WS    /api/v1/ws  (push channel)          │
                  │  Capability matcher filters task broadcast │
                  └──┬─────────────────────┬──────────────────┘
                     │                     │
              WS push + REST        WS push + REST
                     │                     │
            ┌────────┴────────┐    ┌───────┴───────────┐
            │   mab-agent     │    │  mab-worker       │
            │   (MCP stdio)   │    │  (systemd daemon) │
            │                 │    │                   │
            │ Claude Code     │    │  Adapter:         │
            │ /lead-mode      │    │   anthropic       │
            │ /worker-mode    │    │   ollama          │
            │ (interactive)   │    │   claude-cli      │
            └─────────────────┘    └───────────────────┘
              lead / dev box           production worker

Roles compose. A small setup has the broker, one lead mab-agent on the user's laptop, and one mab-worker daemon on each GPU / model host. The broker is the only piece that needs to be reachable from every other; agents and workers are clients.

Message + task flow

Claude Code → mab-agent → REST POST /api/v1/tasks (required_all=[tier:opus])
                          │
            broker stores task, computes online agents matching capability
                          │
            WS push task_event:created → ONLY agents with tier:opus
                          │
            One of them: POST /api/v1/tasks/{id}/claim
                          │
            Broker validates claimer.capabilities, then atomic UPDATE
                          │
            WS push task_event:claimed → creator + assignee
                          │
            Assignee: PATCH /api/v1/tasks/{id} { status: completed, result, note }
                          │
            WS push task_event:completed → creator + assignee

Full Phase 1-5 long-form design lives in architcture.md.


Capability tags

Tags follow a prefix:value convention (with bare tags also allowed). The matcher does plain string equality on tags; no glob / regex.

Prefix Meaning Examples
model: Exact model identifier model:claude-opus-4-7, model:gpt-4o, model:llama-3.3-70b
family: Vendor / brand family family:claude, family:openai, family:google, family:meta
tier: Capability tier within a family tier:opus, tier:sonnet, tier:haiku, tier:flash, tier:pro
provider: API provider provider:anthropic, provider:openai, provider:google
(bare) Free-form capability flag vision, audio, code-review, cn-locale

mab-agent --model X auto-derives model:, family:, tier:, provider: for known model families. Coverage:

  • Closed-API: Claude (opus / sonnet / haiku), GPT (4 / 4o / 5), Gemini (pro / flash / flash-lite) — full family: + tier: + provider:
  • Open-source (Anthropic / Meta / Alibaba / DeepSeek / Microsoft / Mistral / OpenAI OSS / Google Gemma): gpt-oss, llama / llama-3 / llama-3.3 / llama-4, qwen / qwen2.5 / qwen2.5-coder / qwen3, deepseek / deepseek-r1 / deepseek-v3 / deepseek-coder, mistral / mistral-large / mistral-small, gemma / gemma-3, phi / phi-4family: + tier: (provider intentionally not auto-set since these can be hosted via Ollama, vLLM, Bedrock, etc.)
  • Ollama tag form name:tag (e.g. gpt-oss:20b, llama3.3:70b, qwen2.5-coder:32b): auto-adds size:<tag>, host:local, and provider:ollama. The -cloud suffix (e.g. gpt-oss:120b-cloud) is recognised as Ollama Cloud: size:120b + host:cloud instead of polluting the size tag. Pass --capabilities to override provider if you're running through vLLM / something else.

Unknown model strings only get model:X (+ size: / provider:ollama if :tag form). Anything richer should be passed via --capabilities.

Examples:

mab-agent --broker-url ... --api-key ... --model gpt-oss:20b
# → [model:gpt-oss:20b, size:20b, host:local, provider:ollama,
#    family:gpt-oss, tier:reasoning]

mab-agent --broker-url ... --api-key ... --model gpt-oss:120b-cloud
# → [model:gpt-oss:120b-cloud, size:120b, host:cloud, provider:ollama,
#    family:gpt-oss, tier:reasoning]

How an agent declares its model

Four ways an agent can claim which model it runs (in order of recommendation):

  1. mab-agent --model X at startup — derives the standard tags + PATCH /agents/me before opening the WebSocket, so the broker has correct caps when the agent comes online. Best for fixed-per-process deployments (Claude Code via MCP).
  2. mab-agent --capabilities a,b merge — combines with --model for runtime extras (e.g. add vision for a session). Full replacement of stored tags; not append.
  3. update_my_model / update_my_capabilities MCP tools — the LLM driving the agent can self-update mid-session. Use when the model switches at runtime (e.g. Claude Code /fast toggle) or to add a per-task skill flag.
  4. mab-broker gen-key --capabilities ... — persist a default in the broker DB at registration time. Survives across restarts but won't reflect runtime model changes.

In practice: gen-key for the immutable identity, --model flag for the per-process runtime claim, MCP tools for in-session corrections.

Writing a non-MCP client

Any process speaking REST + WS can join the broker. Reuse BrokerClient directly:

from mab.mcp_server.broker_client import BrokerClient
from mab.shared.capabilities import derive_capabilities_from_model

client = BrokerClient(
    broker_url="http://192.168.1.100:8420",
    api_key="mab-ak-XXXX",
)
# Declare before going online so task routing sees fresh caps.
await client.update_capabilities(
    derive_capabilities_from_model("llama-3.3-70b") + ["gpu-local"]
)
await client.start()           # WS connect + auto-reconnect + heartbeat
# Now drain events as they arrive:
while ...:
    for msg in client.drain_messages():
        ...
    for task in client.drain_task_events():
        ...

tools/watch.py is a complete example of this pattern. A Phase 3 SDK will package it into a proper public API; the helpers above already work today.

Task match semantics

A task carries two optional tag lists, both AND-of-AND-then-AND-of-OR:

create_task(
    title="...",
    required_all=["tier:opus"],            # claimer must have ALL of these
    required_any=["vision", "audio"],      # claimer must have AT LEAST ONE of these
)
Agent tags required_all=[tier:opus], required_any=[vision, audio]
[tier:opus, vision]
[tier:opus, audio]
[tier:opus] ✗ (fails required_any)
[tier:sonnet, vision] ✗ (fails required_all)
[] (no caps) ✗ unless both fields are empty

Empty required_all + empty required_any ⇒ any agent matches (Phase 1 backward-compat behaviour).

Routing behaviour

  • Filter broadcast — broker only sends task_event:created over WS to matching online agents. Non-matchers never know the task exists via push.
  • Authoritative claim — even if an agent learns of a task via list_tasks, claiming it requires capability match; otherwise 403.
  • Directed assignmentcreate_task(assigned_to=...) with capability requirements validates the assignee at creation; 400 if mismatched.
  • Zero matches — task is still created (status pending); later-connecting agents that match can pick it up via list_tasks.

Configuration

Env vars (prefix MAB_):

Var Default Effect
MAB_HOST 0.0.0.0 Broker bind host
MAB_PORT 8420 Broker port
MAB_DB_PATH ~/.multi-agent-broker/db.sqlite SQLite location
MAB_MESSAGE_TTL_DAYS 7 Undelivered message retention
MAB_HEARTBEAT_INTERVAL_SECONDS 30 WS keepalive + app heartbeat
MAB_BROKER_URL http://localhost:8420 (agent) broker to dial
MAB_API_KEY — required — (agent) auth token
MAB_MODEL (agent) model identifier for capability derivation
MAB_CAPABILITIES (agent) extra capability tags, comma-separated

CLI flags on mab-agent mirror the env vars; CLI takes precedence.


Testing

uv run pytest
# 188 tests, ~35s — includes real-subprocess end-to-end demo + live-broker daemon integration

Test layout:

File Coverage
tests/test_db.py SQLite CRUD, atomic claim, TTL cleanup, capability migration
tests/test_auth.py API key hashing, Bearer validation, name conflict / auto-suffix
tests/test_capabilities.py Capability matcher (AND-of-all + AND-of-any), model→tag derivation (Claude / GPT / Gemini / Llama / Qwen / DeepSeek / gpt-oss / Phi + Ollama tag heuristic)
tests/test_routes.py REST routes against in-process FastAPI, including capability validation + task delete (creator / assignee / non-owner / 404)
tests/test_websocket.py Hub routing, backfill, agent/task events, capability filter broadcast
tests/test_broker_client.py BrokerClient against a live uvicorn broker (including app heartbeat)
tests/test_mcp_server.py MCP tool wiring (14 tools) + _pending_messages interceptor + update_my_model derivation + wait_for_task push/timeout/filter
tests/test_e2e_demo.py mab-broker serve + 2× mab-agent via MCP stdio (4 demo scenarios)
tests/worker/test_mock_adapter.py MockAdapter contract: fixed / callable / async / error / delay / lifecycle
tests/worker/test_daemon.py WorkerDaemon against live broker: directly-assigned tasks, open-pool claim, adapter error path, per-task timeout, multi-task survival, capability declaration, setup/teardown, stats
tests/worker/test_anthropic_adapter.py AnthropicAdapter against httpx.MockTransport: happy path, system prompt + temperature, multi-block concat, non-text-block skip, HTTP error mapping, empty response, missing key
tests/worker/test_ollama_adapter.py OllamaAdapter against httpx.MockTransport: local no-auth, cloud bearer, system prompt, response trim, HTTP error, malformed response
tests/worker/test_claude_cli_adapter.py ClaudeCLIAdapter with a temp Python shim mimicking claude CLI: version probe, model + flag wiring, prompt template, exit code mapping, empty stdout, subprocess kill on cancel, extra args
tests/worker/test_cli.py mab-worker CLI: parse_capabilities, build_adapter for each of 5 adapters, env-var defaults, required-flag validation
tests/worker/test_gemini_adapter.py GeminiAdapter against httpx.MockTransport: happy path, system prompt, temperature, multi-part text concat, missing key, env key, HTTP error, SAFETY finishReason surfacing, no-candidates
tests/worker/test_e2e_daemon.py End-to-end daemon scenarios: capability-rejected claim, push-after-catchup (true push path), graceful stop mid-task, 10-task burst
tests/test_dashboard.py /api/v1/dashboard/snapshot shape: unauth 401, empty shape, populated state after creating task/channel/context
tests/test_reaper.py Stuck-task reaper: stuck + stale agent → failed + current_task cleared; fresh agent skipped; cascade failure through downstream; ignore pending / completed
tests/test_retry.py In-place retry: reset failed → pending + cleared state + retry note; 400 on non-failed; 404 on missing; counter increment across attempts; reset to blocked if upstream not completed; cascade reset downstream failures

Roadmap

  • Phase 1 ✅ — broker + MCP agent (agent / message / task)
  • Phase 1.5 ✅ — one-shot deployment (uv + systemd user service)
  • Phase 2.1 ✅ — capability-based task routing
  • Phase 2.2 ✅ — task delete + tools/watch.py observability + live multi-agent cross-machine verification
  • Phase 3a ✅ — Lead Agent enablers: roster (match_agents + is_stale + current_task freshness), capability self-update MCP tools (update_my_model / update_my_capabilities), pre-WS capability declaration, push-driven wait_for_task MCP tool
  • Phase 3 ✅ — mab-worker daemon SDK + 4 adapters (Anthropic / Ollama / Claude CLI / Mock), setup-worker.sh one-shot install, push-driven event-name-filtered task queue. Production-verified: claude-mac (Opus) → broker → daemon (gpt-oss:120b via Ollama Cloud) end-to-end in ~1s
  • Phase 3 (D — depends_on) ✅ — task dependencies: blocked status + auto-unblock on upstream completion + failure cascade through downstream chains. Lets a lead fire a whole multi-step plan in one go instead of polling between steps.
  • Phase 3 (S — shared context) ✅ — pinned named documents any agent can read. Solves "every task description duplicates the same style guide" and supports auto-promoting upstream task results as named handoff docs.
  • Phase 3 (CH — channels) ✅ — named group-broadcast topics. Members get WS push for every new message; non-members can still read history via REST but don't receive pushes and can't post. Distinct from direct messages (1:1) and tasks (claim-lifecycle) — for coordination noise, status updates, open queries. 12 new tests (9 REST + 2 WS + 1 MCP).
  • Phase 3 (cookbook) ✅ — 8 production-verified cross-vendor recipes in docs/cookbook.md: two-stage thinking, fan-in synthesis, failure cascade, pinned project spec, group broadcast, /lead-mode end-to-end (examples/dino.html), adding a third vendor (Google Gemini on RPi), cross-vendor comparison + in-place retry.
  • Phase 3 (reliability hardening) ✅ — supersede-race fix (two mab-agents sharing an api-key no longer flap status via WS supersede cycle).
  • Phase 5 (broker reliability) ✅ — stuck-task reaper: tasks pinned to assigned / in_progress whose owner agent goes stale auto-fail with a self-describing note, cascade-fails through downstream depends_on chains. Heartbeat-only trigger so long-running LLM calls on healthy workers aren't false-positived.
  • Phase 5a (read-only web dashboard) ✅ — vanilla HTML+JS+CSS bundled into the broker at /dashboard (single 28 KB file, no build step). Per-batch task graph via depends_on connected-component grouping; live agent state with per-task agent + model + duration metadata; dark / cyan-accent aesthetic; status lifecycle gray → amber → green/red.
  • Phase 5 (adapter resilience) ✅ — GeminiAdapter (Google AI Studio, OpenAI-compatible-ish via httpx-direct, 4th built-in adapter). max_output_tokens defaults raised to 8192 after demo11 silently truncated long-form output. In-place retry primitive: POST /api/v1/tasks/{id}/retry resets a failed task on the same id, cascade-resets downstream failures, dashboard shows ↻N history badge + 1s pulse animation while retrying.
  • Phase 5 (worker ops symmetry) ✅ — deploy/update-worker.sh mirrors the broker's update.sh: ff-only pull, uv sync, restart every mab-worker*.service (or filter via --name), per-unit online-status verification via /agents/me.
  • Test total: 212.
  • Phase 4 — TLS + JWT + IP allowlist for public-internet deployment (deferred — LAN deployment works for current use)
  • Phase 5b — dashboard write actions (dispatch / claim / delete / send message / retry button on red nodes — retry route already lands)
  • Phase 5c — full-text search across messages + task descriptions/results + contexts + channel messages (SQLite FTS5)

License

Apache-2.0 — see LICENSE and NOTICE.

About

Make LLMs from different vendors collaborate as one team. Capability-based task routing across Claude / GPT / Gemini / Ollama. Lead dispatches → broker filter-broadcasts → workers (any LLM, any host) pick up matching tasks. Single-vendor stacks already have subagents — this is for when the right tool for the job lives in another vendor's stack.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors