diff --git a/docs/ai-triage.en.md b/docs/ai-triage.en.md index 26d4992..b1e555d 100644 --- a/docs/ai-triage.en.md +++ b/docs/ai-triage.en.md @@ -61,6 +61,7 @@ JSON-RPC 2.0 over HTTP at `POST /mcp`. Implements `initialize`, `tools/list`, `t |---------|----------|---------|-------------| | `AGENT_MCP_ENABLED` | `--mcp-enabled` | `false` | Enable the MCP server | | `AGENT_MCP_BIND` | `--mcp-bind` | `127.0.0.1:9103` | Listen address (host:port) | +| `AGENT_MCP_TOKEN` | `--mcp-token` | (none) | Bearer token required on every request. Mandatory for non-loopback bind. | ### Tools @@ -80,8 +81,8 @@ Tools whose dependency isn't configured (eBPF off, port-scan off, GPU absent, et ### Security model -- **Localhost-only by default.** The bind defaults to `127.0.0.1:9103`. Override to `0.0.0.0:9103` only behind a network policy (firewall, mesh policy, security group). -- **No built-in auth.** The trust boundary is the bind address. If you expose MCP off the loopback, you must put auth in front of it (mTLS at the proxy, IP allow-list, or wrap it in a sidecar). +- **Localhost-only by default.** The bind defaults to `127.0.0.1:9103`. When loopback (`127.0.0.0/8`, `::1`, `localhost`), the trust boundary is the bind address itself. +- **Token mandatory off-loopback.** If `AGENT_MCP_BIND` is changed to a non-loopback address, `AGENT_MCP_TOKEN` must be set or `serve_mcp` refuses to start. When set, every request must carry `Authorization: Bearer `; comparison is constant-time. Setting the token on a loopback bind is allowed and stacks as defence in depth. A network-policy layer in front (firewall, mesh policy, security group) remains a separate, encouraged control. - **Read-mostly tools.** `allocate_ports` is stateless (caller must bind immediately) and `agent_check_update --force` only triggers a manifest poll — neither mutates fleet state. ### Example: call a tool diff --git a/docs/ai-triage.zh.md b/docs/ai-triage.zh.md index c61ad9e..a3c46ae 100644 --- a/docs/ai-triage.zh.md +++ b/docs/ai-triage.zh.md @@ -61,6 +61,7 @@ MCP 服务器和 triage 端点之间**不直接通信**。运维人员(或后续 |---------|---------|--------|------| | `AGENT_MCP_ENABLED` | `--mcp-enabled` | `false` | 启用 MCP 服务器 | | `AGENT_MCP_BIND` | `--mcp-bind` | `127.0.0.1:9103` | 监听地址(host:port) | +| `AGENT_MCP_TOKEN` | `--mcp-token` | (无) | 每个请求都必须携带的 Bearer token。非 loopback 绑定时必须设置。 | ### 工具列表 @@ -80,8 +81,8 @@ MCP 服务器和 triage 端点之间**不直接通信**。运维人员(或后续 ### 安全模型 -- **默认仅监听 localhost。** 默认绑定 `127.0.0.1:9103`。改成 `0.0.0.0:9103` 必须配合网络策略(防火墙、mesh 策略、安全组)。 -- **没有内置认证。** 信任边界就是绑定地址本身。如果把 MCP 暴露到 loopback 之外,必须自己加认证(代理层 mTLS、IP 白名单、或包一层 sidecar)。 +- **默认仅监听 localhost。** 默认绑定 `127.0.0.1:9103`。当绑定为 loopback(`127.0.0.0/8`、`::1`、`localhost`)时,信任边界就是绑定地址本身。 +- **非 loopback 时强制 token。** 如果把 `AGENT_MCP_BIND` 改成非 loopback 地址,必须同时设置 `AGENT_MCP_TOKEN`,否则 `serve_mcp` 拒绝启动。一旦配置了 token,每个请求都必须携带 `Authorization: Bearer `;比较使用恒定时间算法。loopback 上同样可以配置 token 作为纵深防御。前置网络策略(防火墙、mesh 策略、安全组)仍然是另一道独立的、推荐的防线。 - **以读为主的工具集。** `allocate_ports` 是无状态的(调用方必须立即 bind),`agent_check_update --force` 只触发一次 manifest 轮询 —— 两者都不修改集群状态。 ### 示例:调用一个工具 diff --git a/sigma-agent/CLAUDE.md b/sigma-agent/CLAUDE.md index a7ee7a3..b634254 100644 --- a/sigma-agent/CLAUDE.md +++ b/sigma-agent/CLAUDE.md @@ -57,6 +57,7 @@ sigma-agent/ | `AGENT_EBPF_TRAFFIC_MAX_ENTRIES` | `--ebpf-traffic-max-entries` | `8192` | BPF map max entries (unique PIDs) | | `AGENT_MCP_ENABLED` | `--mcp-enabled` | `false` | Enable MCP (LLM tool) server | | `AGENT_MCP_BIND` | `--mcp-bind` | `127.0.0.1:9103` | MCP listen address (host:port) | +| `AGENT_MCP_TOKEN` | `--mcp-token` | (none) | Bearer token required on every MCP request. Mandatory for non-loopback bind. | ## IP Discovery @@ -491,8 +492,14 @@ external LLM can call. This is the agent half of Sigma's AI surface; the LLM "br Idle cost ≈ a listening socket. - **No new background loops.** All tools read from `Arc>` snapshots maintained by existing subsystems. -- **Localhost-only by default.** `AGENT_MCP_BIND` defaults to `127.0.0.1:9103`. The whole - surface is gated by the bind address — there is no per-request auth. +- **Localhost-only by default.** `AGENT_MCP_BIND` defaults to `127.0.0.1:9103`. When the + bind is loopback (`127.0.0.0/8`, `::1`, `localhost`) the surface is gated by the bind + address alone. +- **Token mandatory off-loopback.** If `AGENT_MCP_BIND` is changed to a non-loopback + address, `AGENT_MCP_TOKEN` must be set or `serve_mcp` refuses to start (loud `error!` + log, agent keeps running with MCP disabled). When set, every request must carry + `Authorization: Bearer `; the comparison is constant-time. Setting the token on a + loopback bind is allowed and stacks for defence in depth. - **Read-mostly.** `allocate_ports` is stateless (caller must bind immediately) and `agent_check_update --force` only triggers a manifest poll. No tool mutates fleet state. diff --git a/sigma-agent/README.md b/sigma-agent/README.md index 27244fe..cd26819 100644 --- a/sigma-agent/README.md +++ b/sigma-agent/README.md @@ -33,6 +33,7 @@ Config via environment variables or CLI flags (flags override env): | `AGENT_SSH_PORT` | `--ssh-port` | `22` | SSH port to report | | `AGENT_MCP_ENABLED` | `--mcp-enabled` | `false` | Enable MCP (LLM tool) server | | `AGENT_MCP_BIND` | `--mcp-bind` | `127.0.0.1:9103` | MCP listen address (host:port) | +| `AGENT_MCP_TOKEN` | `--mcp-token` | (none) | Bearer token required on every MCP request. Mandatory when `--mcp-bind` is non-loopback. | ## Usage @@ -269,7 +270,23 @@ When `--mcp-enabled` is set, the agent runs a [Model Context Protocol](https://m **Design contract — keep the agent lean.** The MCP server is intentionally light: no LLM, no persistent state, no extra background loops. Each tool wraps data already collected by `port_scan`, `ebpf_traffic`, or `xds`, or proxies a single call to `sigma-api`. Idle resource cost is effectively a listening socket; per-call CPU is bounded by the underlying capability. This keeps the agent within its budget (<1% CPU, <50MB RSS) on 1 vCPU VPS instances. The "AI brain" lives in `sigma-api`, not here. -**Security default — localhost-only.** Binds to `127.0.0.1:9103` by default. Override to `0.0.0.0:9103` (or another address) only behind a network policy. +**Security defaults.** Binds to `127.0.0.1:9103` by default — loopback-only, no auth required. If you change `--mcp-bind` to a non-loopback address (`0.0.0.0`, a specific public IP, etc.), you **must** also set `--mcp-token` / `AGENT_MCP_TOKEN`. Without it the MCP server refuses to start (the rest of the agent keeps running) — the operator gets an `error!` log and chooses to set the token or revert the bind. + +When a token is configured, every request must carry `Authorization: Bearer `; tokens are compared in constant time. Setting the token on a loopback bind is allowed and stacks as defence in depth. + +```bash +# Off-loopback example — both env vars required: +AGENT_MCP_ENABLED=true \ +AGENT_MCP_BIND=0.0.0.0:9103 \ +AGENT_MCP_TOKEN=$(openssl rand -hex 32) \ + ./sigma-agent + +# Client must present the token: +curl -s -X POST http://:9103/mcp \ + -H "Authorization: Bearer $AGENT_MCP_TOKEN" \ + -H 'Content-Type: application/json' \ + -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' +``` ### Tools diff --git a/sigma-agent/src/config.rs b/sigma-agent/src/config.rs index f76e5bc..cc89cd9 100644 --- a/sigma-agent/src/config.rs +++ b/sigma-agent/src/config.rs @@ -122,6 +122,11 @@ pub struct Config { /// MCP server bind address (host:port). Defaults to 127.0.0.1 — override to 0.0.0.0 only with a network policy. #[arg(long, env = "AGENT_MCP_BIND", default_value = "127.0.0.1:9103")] pub mcp_bind: String, + + /// Bearer token required for MCP requests. Mandatory when `mcp_bind` is non-loopback; + /// optional (and ignored if empty) when loopback. Callers must send `Authorization: Bearer `. + #[arg(long, env = "AGENT_MCP_TOKEN")] + pub mcp_token: Option, } impl Config { diff --git a/sigma-agent/src/main.rs b/sigma-agent/src/main.rs index 5f2cdfa..535f65c 100644 --- a/sigma-agent/src/main.rs +++ b/sigma-agent/src/main.rs @@ -302,6 +302,10 @@ async fn main() -> Result<()> { } else { None }, + auth_token: config + .mcp_token + .clone() + .filter(|s| !s.is_empty()), }); let bind = config.mcp_bind.clone(); info!(bind = %bind, "MCP server enabled"); diff --git a/sigma-agent/src/mcp.rs b/sigma-agent/src/mcp.rs index 34828bc..d3c8e28 100644 --- a/sigma-agent/src/mcp.rs +++ b/sigma-agent/src/mcp.rs @@ -34,6 +34,7 @@ use std::sync::Arc; use axum::extract::State; +use axum::http::{HeaderMap, StatusCode}; use axum::routing::post; use axum::{Json, Router}; use serde::{Deserialize, Serialize}; @@ -131,6 +132,10 @@ pub struct McpState { pub update_info: Option, /// Manifest URL — needed so `agent_check_update` can run a forced poll. pub update_manifest_url: Option, + /// Bearer token. When `Some`, every `/mcp` request must include + /// `Authorization: Bearer `. Compared in constant time. + /// `serve_mcp` refuses to start a non-loopback bind unless this is set. + pub auth_token: Option, } // ---------- Tool schemas (returned by tools/list) ---------- @@ -690,11 +695,28 @@ async fn tool_query_syn_flood_candidates( async fn mcp_handler( State(state): State>, + headers: HeaderMap, body: axum::body::Bytes, -) -> Json { +) -> Result, StatusCode> { + // Auth: if a token is configured, every request must carry it. + // No `WWW-Authenticate` header — this is RPC, not a browser-facing API. + if let Some(ref expected) = state.auth_token { + let presented = headers + .get("authorization") + .and_then(|v| v.to_str().ok()) + .and_then(|s| s.strip_prefix("Bearer ")); + let ok = match presented { + Some(token) => constant_time_eq(token.as_bytes(), expected.as_bytes()), + None => false, + }; + if !ok { + return Err(StatusCode::UNAUTHORIZED); + } + } + let req: JsonRpcRequest = match serde_json::from_slice(&body) { Ok(r) => r, - Err(e) => return Json(err(None, ERR_PARSE, format!("parse error: {}", e))), + Err(e) => return Ok(Json(err(None, ERR_PARSE, format!("parse error: {}", e)))), }; let id = req.id.clone(); @@ -726,16 +748,67 @@ async fn mcp_handler( ), }; - Json(response) + Ok(Json(response)) +} + +/// Length-stable byte comparison. The token length itself is not secret, +/// so an early `len()` mismatch is fine — what we don't want is short-circuit +/// content comparison leaking a per-byte timing oracle. +fn constant_time_eq(a: &[u8], b: &[u8]) -> bool { + if a.len() != b.len() { + return false; + } + let mut diff: u8 = 0; + for (x, y) in a.iter().zip(b.iter()) { + diff |= x ^ y; + } + diff == 0 +} + +/// True when `bind` resolves to a loopback interface (no remote network +/// exposure). Used by `serve_mcp` to decide whether `auth_token` is +/// mandatory. Recognises: +/// - IPv4 in `127.0.0.0/8` (anything `std::net::Ipv4Addr::is_loopback`) +/// - IPv6 `::1` (and any other `Ipv6Addr::is_loopback`) +/// - The literal hostname `localhost` +/// Anything else — including `0.0.0.0`, `::`, a public IP, or a DNS name — +/// is treated as non-loopback and therefore requires a token. +fn is_loopback_bind(bind: &str) -> bool { + // host:port. IPv6 binds wrap the host in brackets, e.g. `[::1]:9103`. + let host = match bind.rsplit_once(':') { + Some((h, _)) => h.trim_start_matches('[').trim_end_matches(']'), + None => bind, + }; + if host.eq_ignore_ascii_case("localhost") { + return true; + } + if let Ok(ip) = host.parse::() { + return ip.is_loopback(); + } + false } #[allow(dead_code)] // referenced in error reporting via ERR_INVALID_PARAMS const _: i32 = ERR_INVALID_PARAMS; pub async fn serve_mcp(bind: String, state: Arc) { + // Foot-gun guard: a non-loopback bind without a token would expose every + // tool — port allocation, route enumeration, forced update polls — to + // anyone who can reach the agent. Refuse to start instead of binding. + // The agent itself keeps running; the operator gets a loud error and + // chooses to set the token or revert the bind. + if !is_loopback_bind(&bind) && state.auth_token.is_none() { + error!( + bind = %bind, + "MCP server refusing to start: non-loopback bind requires AGENT_MCP_TOKEN. \ + Set the token, or move the bind back to 127.0.0.1 / ::1 / localhost." + ); + return; + } + let app = Router::new() .route("/mcp", post(mcp_handler)) - .with_state(state); + .with_state(state.clone()); let listener = match TcpListener::bind(&bind).await { Ok(l) => l, @@ -745,9 +818,56 @@ pub async fn serve_mcp(bind: String, state: Arc) { } }; - info!(bind = %bind, "MCP server listening on /mcp (JSON-RPC 2.0, MCP protocol)"); + info!( + bind = %bind, + loopback = is_loopback_bind(&bind), + auth = state.auth_token.is_some(), + "MCP server listening on /mcp (JSON-RPC 2.0, MCP protocol)" + ); if let Err(e) = axum::serve(listener, app).await { error!(error = %e, "MCP server error"); } } + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn loopback_recognises_ipv4_127() { + assert!(is_loopback_bind("127.0.0.1:9103")); + assert!(is_loopback_bind("127.0.0.2:9103")); + assert!(is_loopback_bind("127.255.255.254:1")); + } + + #[test] + fn loopback_recognises_ipv6() { + assert!(is_loopback_bind("[::1]:9103")); + } + + #[test] + fn loopback_recognises_localhost() { + assert!(is_loopback_bind("localhost:9103")); + assert!(is_loopback_bind("LOCALHOST:9103")); + } + + #[test] + fn loopback_rejects_wildcards_and_externals() { + assert!(!is_loopback_bind("0.0.0.0:9103")); + assert!(!is_loopback_bind("[::]:9103")); + assert!(!is_loopback_bind("192.0.2.5:9103")); + assert!(!is_loopback_bind("example.com:9103")); + // Unparseable garbage is treated as non-loopback (fail-closed). + assert!(!is_loopback_bind("not-a-bind")); + } + + #[test] + fn constant_time_eq_matches_string_eq() { + assert!(constant_time_eq(b"", b"")); + assert!(constant_time_eq(b"secret", b"secret")); + assert!(!constant_time_eq(b"secret", b"secrey")); + assert!(!constant_time_eq(b"secret", b"secre")); + assert!(!constant_time_eq(b"secret", b"SECRET")); + } +}