Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 73 additions & 1 deletion sigma-agent/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ sigma-agent/
├── ebpf_traffic.rs # eBPF traffic monitoring: loader, harvester, container resolution (feature-gated)
├── envoy_config.rs # Parse envoy.yaml static_resources → static route entries
├── xds.rs # gRPC ADS server: config polling, push to Envoy clients
└── xds_resources.rs # Builds xDS Cluster + Listener protos from envoy routes
├── xds_resources.rs # Builds xDS Cluster + Listener protos from envoy routes
└── mcp.rs # MCP JSON-RPC 2.0 server: tools/list, tools/call surface
```

## Configuration
Expand All @@ -54,6 +55,8 @@ sigma-agent/
| `AGENT_EBPF_TRAFFIC` | `--ebpf-traffic` | `false` | Enable eBPF TCP traffic monitoring |
| `AGENT_EBPF_TRAFFIC_INTERVAL` | `--ebpf-traffic-interval` | `30` | eBPF traffic stats collection interval (seconds) |
| `AGENT_EBPF_TRAFFIC_MAX_ENTRIES` | `--ebpf-traffic-max-entries` | `8192` | BPF map max entries (unique PIDs) |
| `AGENT_MCP_ENABLED` | `--mcp-enabled` | `false` | Enable MCP (LLM tool) server |
| `AGENT_MCP_BIND` | `--mcp-bind` | `127.0.0.1:9103` | MCP listen address (host:port) |

## IP Discovery

Expand Down Expand Up @@ -472,6 +475,74 @@ If the kernel doesn't support BTF or eBPF programs fail to load, the agent logs
continues without traffic metrics. The feature is fully optional — building without `--features
ebpf-traffic` produces a binary with zero eBPF dependencies.

## MCP Tool Surface

When `--mcp-enabled` is set, the agent runs a [Model Context Protocol](https://modelcontextprotocol.io)
server at `POST /mcp` (JSON-RPC 2.0 over HTTP), exposing agent capabilities as tools that an
external LLM can call. This is the agent half of Sigma's AI surface; the LLM "brain" lives in
`sigma-api` at `/api/ai/triage`, not here.

### Design contract

- **No LLM in the agent.** The MCP server is a thin tool surface — no model loading, no
inference, no tokens spent locally. Keeps the agent within its <1 vCPU / 512MB budget.
- **No persistent state.** Every tool wraps data already collected by `port_scan`,
`ebpf_traffic`, `health_probe`, `gpu`, `watchdog`, or proxies a single call to `sigma-api`.
Idle cost ≈ a listening socket.
- **No new background loops.** All tools read from `Arc<RwLock<...>>` snapshots maintained by
existing subsystems.
- **Localhost-only by default.** `AGENT_MCP_BIND` defaults to `127.0.0.1:9103`. The whole
surface is gated by the bind address — there is no per-request auth.
- **Read-mostly.** `allocate_ports` is stateless (caller must bind immediately) and
`agent_check_update --force` only triggers a manifest poll. No tool mutates fleet state.

### Tools exposed

| Tool | Wraps |
|------|-------|
| `query_metrics` | `system::collect_system_info` + `port_scan` snapshot |
| `query_ebpf_traffic` | `ebpf_traffic::SharedTrafficStats` (feature-gated) |
| `allocate_ports` | `port_scan::find_available_ports` (in `spawn_blocking`) |
| `query_envoy_routes` | `GET /api/envoy-nodes` + `/api/envoy-routes` via `SigmaClient` |
| `query_dns_leaks` | filtered view of `SharedTrafficStats` (feature-gated) |
| `query_gpu_metrics` | `gpu::SharedGpuMetrics` |
| `query_backend_health` | `health_probe::SharedProbeResults` |
| `query_syn_flood_candidates` | `ebpf_traffic::SharedSynStats` (feature-gated) |
| `agent_check_update` | `watchdog::SharedUpdateInfo` (or forced `watchdog::check_once`) |

Tools whose dependency is not configured (eBPF off, port-scan off, no GPU, registration
failed, etc.) return `enabled=false` rather than erroring — the MCP session never breaks on a
missing capability.

### Protocol

JSON-RPC 2.0 methods implemented:
- `initialize` — protocol handshake, returns `MCP_PROTOCOL_VERSION = "2025-06-18"`
- `tools/list` — enumerates tools with JSON schemas (`additionalProperties: false`)
- `tools/call` — invokes a tool by name with arguments
- `notifications/initialized` — accepted no-op

Unknown methods return JSON-RPC error `-32601`. Tool-level failures return an MCP-shaped
`isError: true` content block, **not** a JSON-RPC error — this is intentional per MCP spec
(protocol errors vs. tool errors are distinct).

### Where this fits in the AI flow

```
operator (sigma-web)
└─► POST /api/ai/triage (sigma-api, holds the LLM brain)
└─► LLM provider (Anthropic / OpenAI / Doubao / Grok)

operator or MCP-aware client
└─► POST /mcp (sigma-agent, this binary) ──► tool result
└─► operator pastes result into triage `context`
```

The two surfaces don't talk to each other directly — the operator (or, later, an MCP-aware
client) bridges them. See `docs/ai-triage.en.md` for the end-to-end flow and the operator
walkthrough, and `README.md` § MCP Tool Surface for the full per-tool reference + curl
examples.

## Dependencies

- Reuses sigma-probe HTTP client pattern (SigmaClient with X-Api-Key auth)
Expand All @@ -481,3 +552,4 @@ ebpf-traffic` produces a binary with zero eBPF dependencies.
- xDS: `tonic` 0.12, `prost` 0.13, `xds-api` 0.2 (pre-compiled Envoy proto bindings)
- Static config sync: `serde_yaml` 0.9 (parse envoy.yaml)
- eBPF traffic (optional): `aya` 0.13, `aya-log` 0.2, `aya-ebpf` 0.1 (kernel programs)
- MCP: `axum` 0.8 (reuses agent's existing HTTP stack — no new transitive deps)
Loading