diff --git a/plugins/smelt-agent/AGENTS.md b/plugins/smelt-agent/AGENTS.md index 3f2a0ad1..6613c2a2 100644 --- a/plugins/smelt-agent/AGENTS.md +++ b/plugins/smelt-agent/AGENTS.md @@ -8,7 +8,8 @@ You are a smelt worker agent executing Assay runs. Your job is to receive a run | --- | --- | | `/assay:run-dispatch` | Dispatch a single or multi-session run from a manifest | | `/assay:backend-status` | Query orchestrator status and interpret results | -| `/assay:peer-message` | Send and receive messages between sessions (mesh/gossip) | +| `/assay:peer-message` | Send and receive messages between sessions (mesh/gossip/signal) | +| `/assay:peer-registry` | Peer discovery, registration, and cross-instance signal forwarding | ## MCP Tools @@ -23,6 +24,9 @@ You are a smelt worker agent executing Assay runs. Your job is to receive a run | `cycle_status` | Get active milestone progress | | `cycle_advance` | Advance the active chunk | | `chunk_status` | Get gate results for a specific chunk | +| `poll_signals` | Read `PeerUpdate` messages from a session's signal inbox | +| `send_signal` | POST a `SignalRequest` to any signal endpoint URL | +| `merge_propose` | Push branch and create a GitHub PR with gate evidence | ## Workflow @@ -41,5 +45,22 @@ Not all backends support every feature. Check the `CapabilitySet` before relying - `supports_gossip_manifest: false` → gossip knowledge manifest may not persist between rounds - `supports_annotations: false` → run annotations are not stored - `supports_checkpoints: false` → team checkpoints are not persisted +- `supports_signals: false` → signal endpoint events are not pushed to the backend +- `supports_peer_registry: false` → peer registration/discovery is not available; cross-instance forwarding is disabled Capability-limited runs degrade gracefully — they are not failures. + +### Cross-Instance Signal Forwarding + +When the signal endpoint receives a `POST /api/v1/signal` for an unknown local session, it queries the peer registry (`list_peers()`) and forwards the request to known peers. The first peer to return `202 Accepted` wins. An `X-Assay-Forwarded: true` header prevents forwarding loops — forwarded requests that miss locally return `404` immediately. + +**Environment variables for the signal endpoint:** + +| Variable | Default | Description | +| --- | --- | --- | +| `ASSAY_SIGNAL_PORT` | `7432` | Port for the HTTP signal listener | +| `ASSAY_SIGNAL_BIND` | `127.0.0.1` | Bind address (`0.0.0.0` for all interfaces) | +| `ASSAY_SIGNAL_URL` | _(derived)_ | Override the peer-registered URL — required when `ASSAY_SIGNAL_BIND=0.0.0.0` to provide a routable address | +| `ASSAY_SIGNAL_TOKEN` | _(none)_ | Optional bearer token for auth | + +On startup, the MCP server registers itself as a peer in the state backend. On clean shutdown, it unregisters. diff --git a/plugins/smelt-agent/skills/peer-message.md b/plugins/smelt-agent/skills/peer-message.md index 13a3c9fb..17176c94 100644 --- a/plugins/smelt-agent/skills/peer-message.md +++ b/plugins/smelt-agent/skills/peer-message.md @@ -61,3 +61,41 @@ In gossip mode, there is no direct messaging between sessions. Instead, a coordi ### Capability Guard 9. **Check `supports_gossip_manifest` before relying on the manifest.** If the backend has `supports_gossip_manifest: false`, the knowledge manifest may not persist between coordinator rounds. Check `gossip_status.sessions_synthesized` via `orchestrate_status` — if it stays at zero despite sessions completing, manifest persistence is disabled. + +## Signal-Based Messaging (Cross-Instance) + +For multi-machine deployments, sessions communicate via the HTTP signal endpoint instead of filesystem-based mesh routing. + +### Receiving Signals + +10. **Use the `poll_signals` MCP tool** to read `PeerUpdate` messages from your session's signal inbox: + ```json + { "session_name": "worker-1" } + ``` + Returns a `PollSignalsResult` with a `signals` array of `PeerUpdate` objects. Messages are consumed on read (exactly-once delivery). + +### Sending Signals + +11. **Use the `send_signal` MCP tool** to POST a signal to any Assay signal endpoint: + ```json + { + "url": "http://peer-host:7432/api/v1/signal", + "target_session": "orchestrator", + "update": { + "source_job": "job-abc", + "source_session": "worker-1", + "changed_files": ["src/main.rs"], + "gate_summary": { "passed": 5, "failed": 0, "skipped": 1 }, + "branch": "feature/auth" + } + } + ``` + Returns the HTTP status code and response body. Non-2xx responses are returned as the tool result (not a tool-level error) so the agent can decide how to proceed. + +### Cross-Instance Forwarding + +12. **Signals for unknown local sessions are forwarded automatically.** When the signal endpoint receives a request for a session not registered locally, it queries the peer registry and forwards to known peers. The first peer to return `202 Accepted` wins. An `X-Assay-Forwarded: true` header prevents forwarding loops. + +### Capability Guard + +13. **Check `supports_signals` and `supports_peer_registry`** to determine if signal-based messaging and cross-instance forwarding are available. `SmeltBackend` supports signals but not peer registry (`supports_peer_registry: false` — register_peer is fire-and-forget, forwarding uses Smelt's server-side routing); `LocalFsBackend` supports peer registry but not signal push; `NoopBackend` supports neither. diff --git a/plugins/smelt-agent/skills/peer-registry.md b/plugins/smelt-agent/skills/peer-registry.md new file mode 100644 index 00000000..759a3894 --- /dev/null +++ b/plugins/smelt-agent/skills/peer-registry.md @@ -0,0 +1,123 @@ +--- +name: peer-registry +description: > + Peer discovery, registration, and cross-instance signal forwarding. + Use when configuring multi-machine deployments where multiple Assay + instances need to discover each other and forward signals across hosts. +--- + +# Peer Registry + +Register, discover, and forward signals between Assay instances running on different machines. + +## Overview + +Each Assay MCP server can register itself as a **peer** in the state backend. Other instances query the peer registry to discover where to forward signals for sessions they don't own locally. This enables multi-machine orchestration without a central message broker. + +``` +┌──────────────┐ ┌──────────────┐ +│ Machine A │ │ Machine B │ +│ assay-mcp │◄───────►│ assay-mcp │ +│ :7432 │ HTTP │ :7432 │ +│ │ forward │ │ +│ worker-1 │ │ worker-2 │ +│ worker-3 │ │ orchestrator│ +└──────────────┘ └──────────────┘ + │ │ + └────────┬───────────────┘ + peers.json (or Smelt API) +``` + +## PeerInfo Type + +Each registered peer is a `PeerInfo` record: + +```json +{ + "peer_id": "machine-a", + "signal_url": "http://192.168.1.10:7432", + "registered_at": "2026-03-29T12:00:00Z" +} +``` + +| Field | Type | Description | +| --- | --- | --- | +| `peer_id` | `String` | Unique identifier (typically hostname or UUID) | +| `signal_url` | `String` | HTTP endpoint for the signal server | +| `registered_at` | `DateTime` | When this peer was registered | + +## Backend Methods + +The `StateBackend` trait provides three peer registry methods with default no-op implementations: + +| Method | Description | +| --- | --- | +| `register_peer(peer: &PeerInfo)` | Upsert a peer entry (by `peer_id`) | +| `list_peers()` | Return all registered peers | +| `unregister_peer(peer_id: &str)` | Remove a peer entry (idempotent) | + +### LocalFsBackend + +Stores peers in `{assay_dir}/peers.json`. Writes are atomic (temp file + rename). Suitable for single-machine multi-process setups where all Assay instances share the same `.assay/` directory. + +### SmeltBackend + +Registers peers by POSTing `PeerInfo` JSON to `{smelt_url}/api/v1/peers`. Graceful degradation — registration failure logs a warning but does not abort startup. `list_peers` and `unregister_peer` use the default no-op implementations (Smelt manages peer lifecycle server-side). + +### Other Backends + +`NoopBackend`, `LinearBackend`, `GitHubBackend`, and `SshSyncBackend` all return `supports_peer_registry: false` and use the default no-op implementations. + +## Automatic Lifecycle + +The MCP server manages peer registration automatically: + +1. **On startup** — after the signal endpoint binds, the server calls `register_peer` with its hostname and `signal_url` derived from `ASSAY_SIGNAL_BIND` and `ASSAY_SIGNAL_PORT`. +2. **On clean shutdown** — the server calls `unregister_peer` to remove itself. + +No manual registration is needed for normal operation. + +## Cross-Instance Signal Forwarding + +When `POST /api/v1/signal` targets an unknown local session: + +1. Check for `X-Assay-Forwarded: true` header — if present, return `404` immediately (loop prevention). +2. Check `capabilities().supports_peer_registry` — if false, return `404`. +3. Call `list_peers()` — iterate peers sequentially. +4. For each peer, POST the original `SignalRequest` to `{peer.signal_url}/api/v1/signal` with: + - `X-Assay-Forwarded: true` header (prevents the receiving peer from re-forwarding) + - `Authorization: Bearer ` header (if `ASSAY_SIGNAL_TOKEN` is set) +5. First peer to return `202 Accepted` wins — return `202` to the original caller. +6. If all peers fail or the list is empty, return `404`. + +### Loop Prevention + +The `X-Assay-Forwarded: true` header is the loop-prevention mechanism. A forwarded request that arrives at a peer is never re-forwarded — it either matches a local session (202) or fails (404). This guarantees at most one hop. + +## Multi-Machine Setup + +To deploy Assay across multiple machines: + +1. **Set `ASSAY_SIGNAL_BIND=0.0.0.0`** and **`ASSAY_SIGNAL_URL=http://:7432`** on each machine. Without `ASSAY_SIGNAL_URL`, the registered peer URL is `http://0.0.0.0:7432` — unroutable by other machines. +2. **Use a shared state backend** — either `LocalFsBackend` on a shared filesystem (NFS) or `SmeltBackend` with a central Smelt server. +3. **Start each Assay instance** — each registers itself as a peer automatically. +4. **Dispatch runs** — sessions on any machine can send signals to sessions on any other machine via `send_signal`. Unknown-session signals are forwarded through the peer registry. + +### Environment Variables + +| Variable | Default | Description | +| --- | --- | --- | +| `ASSAY_SIGNAL_PORT` | `7432` | Port for the HTTP signal listener | +| `ASSAY_SIGNAL_BIND` | `127.0.0.1` | Bind address (`0.0.0.0` for multi-machine) | +| `ASSAY_SIGNAL_URL` | _(derived)_ | **Required when `ASSAY_SIGNAL_BIND=0.0.0.0`** — override the peer-registered URL with the machine's reachable address (e.g. `http://192.168.1.10:7432`). Without this, peers register `http://0.0.0.0:7432` which is unroutable. | +| `ASSAY_SIGNAL_TOKEN` | _(none)_ | Optional bearer token for auth (shared across peers) | + +## Capability Guard + +Check `supports_peer_registry` before relying on peer discovery: + +- `supports_peer_registry: true` — `LocalFsBackend` +- Note: `SmeltBackend` returns `false` — it implements `register_peer` as fire-and-forget but `list_peers` and `unregister_peer` are no-ops, so local signal forwarding is disabled; Smelt handles cross-instance routing server-side +- `supports_peer_registry: false` — `NoopBackend`, `LinearBackend`, `GitHubBackend`, `SshSyncBackend` + +When peer registry is unavailable, signals for unknown local sessions return `404` without any forwarding attempt. diff --git a/plugins/smelt-agent/skills/run-dispatch.md b/plugins/smelt-agent/skills/run-dispatch.md index fc647b4a..0a7e20bb 100644 --- a/plugins/smelt-agent/skills/run-dispatch.md +++ b/plugins/smelt-agent/skills/run-dispatch.md @@ -31,6 +31,7 @@ If the controller wants state on a remote backend, set the `state_backend` field - `{ type = "linear", team_id = "TEAM123" }` — Linear project tracking; requires `LINEAR_API_KEY` env var; `project_id` is optional (M011/S02) - `{ type = "github", repo = "owner/repo" }` — GitHub Issues via `gh` CLI; requires `gh` installed and authenticated; `label` is optional (M011/S03) - `{ type = "ssh", host = "worker.example.com", remote_assay_dir = "/home/user/.assay" }` — SCP sync to remote host; `user` and `port` are optional (M011/S04) +- `{ type = "smelt", url = "http://smelt.example.com:9000", job_id = "abc123", token = "secret" }` — Smelt HTTP backend; POSTs orchestrator events to Smelt's `/api/v1/events` endpoint; `token` is optional (bearer auth) - `{ type = "custom", name = "my-backend", config = { ... } }` — custom third-party backend (falls back to no-op) **Note:** `linear`, `github`, and `ssh` backends are stub implementations in the current release — configuring them logs a warning and falls back to a no-op backend that discards all state writes. Full implementations land in M011/S02–S04. diff --git a/plugins/smelt-agent/tests/verify-docs.sh b/plugins/smelt-agent/tests/verify-docs.sh new file mode 100644 index 00000000..af7f5bb2 --- /dev/null +++ b/plugins/smelt-agent/tests/verify-docs.sh @@ -0,0 +1,105 @@ +#!/usr/bin/env bash +# verify-docs.sh — Structural test for smelt-agent plugin documentation. +# +# Checks that MCP tool names referenced in plugin docs exist in the +# assay-mcp router (server.rs). Exits non-zero on any mismatch. +# +# Usage: bash plugins/smelt-agent/tests/verify-docs.sh + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +PLUGIN_DIR="$(cd "$SCRIPT_DIR/.." && pwd)" +REPO_ROOT="$(cd "$PLUGIN_DIR/../.." && pwd)" + +SERVER_RS="$REPO_ROOT/crates/assay-mcp/src/server.rs" + +if [ ! -f "$SERVER_RS" ]; then + echo "ERROR: server.rs not found at $SERVER_RS" + exit 1 +fi + +# Extract MCP tool names from the router (pub async fn declarations in the +# #[tool_router] impl block). These are the canonical tool names. +ROUTER_TOOLS=$(grep 'pub async fn' "$SERVER_RS" \ + | grep -oE 'fn [a-z_]+' \ + | sed 's/fn //' \ + | grep -v '^serve$' \ + | sort -u) + +# Extract tool names referenced in the MCP Tools table in AGENTS.md. +# Table rows look like: | `tool_name` | description | +DOC_TOOLS=$(grep -oE '`[a-z_]+`' "$PLUGIN_DIR/AGENTS.md" \ + | tr -d '`' \ + | sort -u) + +# Known tools that exist on feature branches but not yet on main. +# These are documented in advance of the M015 merge and will be +# validated once M015 lands. +PENDING_TOOLS="" + +# Filter out non-tool identifiers (field names, config keys, etc.) +NON_TOOLS="run_id state_backend" + +ERRORS=0 + +for tool in $DOC_TOOLS; do + # Skip known non-tool identifiers + skip=0 + for nt in $NON_TOOLS; do + if [ "$tool" = "$nt" ]; then + skip=1 + break + fi + done + [ "$skip" -eq 1 ] && continue + + # Check if it's a pending tool (on a feature branch, not yet merged) + pending=0 + for pt in $PENDING_TOOLS; do + if [ "$tool" = "$pt" ]; then + pending=1 + break + fi + done + + if [ "$pending" -eq 1 ]; then + echo " PENDING: $tool (M015 feature branch — not yet on main)" + continue + fi + + # Check if tool exists in router + if ! echo "$ROUTER_TOOLS" | grep -qx "$tool"; then + echo " MISSING: $tool (referenced in docs but not in router)" + ERRORS=$((ERRORS + 1)) + fi +done + +# Also check skill files for tool references +for skill_file in "$PLUGIN_DIR"/skills/*.md; do + SKILL_TOOLS=$(grep -oE '`(poll_signals|send_signal|merge_propose|orchestrate_run|run_manifest|orchestrate_status|gate_run|spec_list|spec_get)`' "$skill_file" 2>/dev/null | tr -d '`' | sort -u || true) + for tool in $SKILL_TOOLS; do + pending=0 + for pt in $PENDING_TOOLS; do + if [ "$tool" = "$pt" ]; then + pending=1 + break + fi + done + [ "$pending" -eq 1 ] && continue + + if ! echo "$ROUTER_TOOLS" | grep -qx "$tool"; then + echo " MISSING: $tool (referenced in $(basename "$skill_file") but not in router)" + ERRORS=$((ERRORS + 1)) + fi + done +done + +if [ "$ERRORS" -gt 0 ]; then + echo "" + echo "FAIL: $ERRORS tool name(s) referenced in docs but missing from router" + exit 1 +fi + +echo "OK: all documented tool names verified" +exit 0