-
-
Notifications
You must be signed in to change notification settings - Fork 119
Swarm
Distributed agent coordination across multiple CortexPrism instances. The swarm layer allows Cortex instances to form a fleet, discover each other, dispatch work directives, and aggregate resource usage — all over the A2A protocol.
┌──────────────────────────────────────────────┐
│ Swarm Coordinator │
│ registerSelf · discoverPeers · dispatch │
│ broadcast · getResourceReport · heartbeat │
├──────────────────────────────────────────────┤
│ A2A Transport Layer │
│ connect · disconnect · sendDirective │
│ ping · fetchRemoteAgentCard │
├──────────────────────────────────────────────┤
│ A2A Protocol (JSON-RPC 2.0) │
└──────────────────────────────────────────────┘
The swarm sits above the A2A transport layer (packages/infra/src/swarm/) and provides the
primary API for fleet operations. It is defined in packages/infra/contracts/swarm.ts and
implemented across 5 modules:
| File | Purpose |
|---|---|
coordinator.ts |
swarm singleton: self-registration, peer discovery, dispatch, heartbeat, resource reports |
node-registry.ts |
CRUD over the nodes table, heartbeat updates, stale-node eviction, peer discovery via A2A agent cards |
transport.ts |
Maps swarm directives to A2A messages over JSON-RPC 2.0 |
directive-handler.ts |
Receiving side: processes incoming directives (spawn sub-agents, execute tasks, query resources) |
remote-kernel.ts |
Proxies remote processes into the local OsKernel process tree; aggregates cross-node resources |
Work is dispatched across the swarm via directives — typed task messages sent from one node
to another. Each directive has a unique ID, priority level (low / normal / high / critical),
a TTL, and is tracked in the swarm_directives table.
| Kind | Purpose |
|---|---|
spawn_agent |
Spawn a sub-agent on a remote node to perform delegated work |
execute_task |
Execute a shell command or tool invocation on a remote node |
query_resources |
Query a remote node's resource accounting (tokens, CPU, memory, sessions) |
forward_message |
Forward a user message or agent output to a remote node |
sync_state |
Synchronize shared state (memory, skills, configuration) between nodes |
Directive results include status (completed / failed / cancelled / timed_out), output,
error details, and execution metrics (tokens in/out, cost, duration, tool calls).
When an instance joins the swarm via cortex swarm init, it registers itself in the nodes table
with a name, host, port, capability tier (root / sudo / unprivileged), group, and A2A
endpoint. The registration is idempotent — re-registering with the same endpoint updates the
existing node record.
Nodes discover each other through three phases:
-
Explicit endpoints — passed to
discoverPeers()directly -
Config seed nodes —
config.swarm.seedNodesarray in~/.cortex/config.json -
Database refresh — existing connected nodes in the
nodestable whose agent cards are refreshed via A2AfetchAgentCard()
Every 30 seconds (HEARTBEAT_INTERVAL_MS = 30_000), each node sends a heartbeat containing:
- CPU percent, memory (used/total), disk (used/total)
- Active sessions and processes
- Tokens used today (in/out), cost USD today
- Uptime seconds
Heartbeat metrics are written to both the nodes table (current values) and the
swarm_resource_snapshots table (time-series, retained for 1440 snapshots per node).
Nodes that miss heartbeats for 120 seconds (NODE_STALE_MS) are automatically marked as
disconnected by markNodesOffline().
-
Drain — sets the node to
drainingstatus (mapped toconnectedin DB). The node stops accepting new directives but completes in-flight work. -
Seal — sets the node to
sealedstatus (mapped todisconnectedin DB). Heartbeat stops; the node gracefully shuts down.
The swarm aggregates resource usage across all nodes:
- Per-node metrics: tokens in/out, cost, tool calls, CPU ms, peak memory, active sessions/processes
-
Fleet totals: summed from the
swarm_resource_snapshotstable (last 24 hours) -
Resource report (
SwarmResourceReport): total nodes, online nodes, aggregate token usage, cost, tool calls, CPU time, peak memory
The remote kernel (remote-kernel.ts) extends the local OsKernel process tree to include
remote processes. Remote sub-agents appear in the local process tree display with PIDs in the
900,000+ range. Resource accounting from remote nodes is synced into the local kernel so that
token usage, cost, and CPU time reflect fleet-wide activity.
// Register a remote sub-agent process
registerRemoteProcess({
parentPid: 0,
agentId: 'explorer_1',
sessionId: 'sess_abc',
role: 'agent',
agentType: 'explorer',
nodeId: 'node_xyz',
});
// Sync remote resource accounting
syncRemoteResources('node_xyz', [
{ agentId: 'coder_1', toolCalls: 42, tokensIn: 5000, tokensOut: 3000, costUsd: 0.12, cpuMs: 8000, peakMemoryMb: 512 },
]);Add seed nodes and enable the swarm in ~/.cortex/config.json:
{
"swarm": {
"seedNodes": ["http://node2:4220/a2a", "http://node3:4220/a2a"],
"group": "production",
"enabled": true
}
}cortex swarm # Show swarm overview and available sub-commands
cortex swarm init # Register this instance as a swarm node
cortex swarm nodes # List all registered swarm nodes with status and metrics
cortex swarm topology # Show process tree and token usage across all nodes
cortex swarm report # Aggregated fleet resource report (tokens, cost, CPU, memory)
cortex swarm drain # Stop accepting new directives (complete in-flight work)
cortex swarm seal # Graceful shutdown — stop heartbeat, stop accepting workcortex swarm init --name my-node --host 192.168.1.10 --port 4220 --group production --tier sudoPrompts for a node name if --name is omitted. Registers the A2A server handler and starts the
heartbeat loop.
Displays each node's name, ID, status (color-coded: green=connected, red=disconnected, yellow=other), host:port, tier, group, active sessions, processes, memory usage, and last heartbeat timestamp.
Shows fleet-level aggregates: online/total nodes, total tokens in/out, total cost, total tool calls, total CPU ms, peak memory, and per-node breakdowns.
| Method | Path | Description |
|---|---|---|
GET |
/api/swarm/topology |
Process tree and token usage across all nodes |
GET |
/api/swarm/report |
Aggregated resource report (tokens, cost, CPU, memory) |
GET |
/api/swarm/directives |
Directive history (filterable by ?status= and ?limit=) |
GET |
/api/swarm/nodes/metrics |
Raw metrics for all connected nodes |
GET |
/api/swarm/nodes/:id/snapshots |
Time-series resource snapshots for a specific node |
| Table | Purpose |
|---|---|
nodes |
Node registry (shared with hub/node system), extended with a2a_endpoint, labels, metrics_json, cpu_percent, memory_used_mb, memory_total_mb
|
swarm_directives |
Directive audit trail: ID, kind, source/target nodes, payload, priority, status, metrics |
swarm_resource_snapshots |
Per-node time-series metrics retained for 1440 snapshots per node |
- A2A Protocol — JSON-RPC 2.0 transport layer used by the swarm
- Distributed Nodes — Hub ←→ Node architecture (complementary to swarm)
- Architecture — Package dependency graph
CortexPrism — Open-source AI agent operating system · Discord · Apache 2.0 License · Built with Deno 2.x + TypeScript
- Agent Loop
- Built-in Agents
- Metacognition
- Memory System
- Skills System
- Sub-Agents
- Built-in Tools
- Code Intelligence
- Code Sandbox
- Cross-Agent Context Protocol
- Prompt Lab
- PKM Assistant
- Voice Pipeline
- Computer Use
- Browser Tool
- Git & GitHub
- Scheduler & Jobs
- Dashboard
- Observability
- A2A Protocol
- MCP Gateway
- Distributed Nodes
- Memori Checkpoints
- Eval System
- Workflow Engine
- Triggers
- Projects
- TUI
- Glossary
- Update System
- Chrome Bridge
- Swarm
- AgentLint
- Model Benchmarking
- Smart Context
- Cost Optimizer