diff --git a/docs/ADRs/0046-host-side-api-server-design.md b/docs/ADRs/0046-host-side-api-server-design.md new file mode 100644 index 000000000..37c30c112 --- /dev/null +++ b/docs/ADRs/0046-host-side-api-server-design.md @@ -0,0 +1,175 @@ +--- +title: "46. Host-side API server design for sandboxed agents" +status: Accepted +relates_to: + - agent-infrastructure + - security-threat-model +topics: + - sandbox + - api-server + - credential-isolation +--- + +# 46. Host-side API server design for sandboxed agents + +Date: 2026-06-02 + +## Status + +Accepted + + + +## Context + +[ADR 0024](0024-harness-definitions.md) introduced the `api_servers` harness field as planned but not +implemented. [ADR 0017](0017-credential-isolation-for-sandboxed-agents.md)/[ADR 0025](0025-provider-credential-delivery-for-sandboxed-agents.md) established the host-side REST server as Tier 3 +of the credential delivery model — for cases where providers (Tier 2) cannot +handle, originally scoped to credentials in request bodies and response +transformation. Practice revealed additional cases beyond provider reach: +long-running operations exceeding MCP timeouts, operations the sandbox +deliberately blocks (container builds, see +[NVIDIA/OpenShell#113](https://github.com/NVIDIA/OpenShell/issues/113)), +and multi-step atomic operations. + +The `host-side-api-server` experiment +([fullsend-ai/experiments#28](https://github.com/fullsend-ai/experiments/pull/28)) +validated the end-to-end flow with two servers (Go container builder, Python +repo provisioner), testing lifecycle management, API discoverability, L7 policy +tuning, per-run auth, and file transfer. This ADR records design decisions +informed by that experiment and by subsequent review discussion on the +implementation model. + +## Options + +### API discoverability + +Three approaches were tested. `/tools.json` (structured tool-use schema) was +the most token-efficient under full access (92k tokens vs 107k for OpenAPI, +100k for baked instructions). Both discovery-based methods fail under +restricted policies where the endpoint is blocked; baked instructions succeed +(84k) but can drift from the actual API. OpenAPI's verbose structure adds +context tokens without proportional benefit for LLM agents. + +### Per-run authentication + +**UUID bearer token via provider placeholders:** simple, proven, no key +management. The proxy resolves the placeholder — the real token never enters +the sandbox. No claims or expiry. + +**Short-lived JWTs with claims:** per-operation authorization and audit trail, +but adds signing key management and JWT library dependencies in every server. +The L7 policy already restricts reachable endpoints, making JWT claims a +redundant second layer. + +### File transfer between server and sandbox + +**`openshell sandbox upload/download` from the server:** works today, validated +by experiment, handles real-time exchange during request handling. Couples +server to OpenShell CLI. + +**Shared host mount:** transparent POSIX access, no CLI coupling. Depends on +OpenShell mount support that is not yet universally available +([NVIDIA/OpenShell#1509](https://github.com/NVIDIA/OpenShell/issues/1509)). + +**HTTP multipart via the API:** fully portable, but large files through the L7 +proxy add overhead. + +### Implementation model + +**Language-agnostic process contract:** each server is an independent binary +in any language. The runner manages it via CLI flags (`--port`, `--token`, +`--bind-address`) and expects `/healthz`, `/tools.json`, bearer auth, and +clean `SIGTERM` shutdown. The experiment validated this with Go and Python +servers. Maximally flexible, but every server re-implements boilerplate (CLI +parsing, health endpoints, auth middleware, security hardening). + +**Go interface:** fullsend-maintained servers implement an internal Go +interface, compiled to a single binary. Simplifies deployment (one binary, +no runtime dependencies) and enables stub-based testing. The process +contract remains the runner's enforcement boundary — the interface is an +internal pattern, not a runner requirement. Provides a seam for adapting +deployment topology across workflow platforms (sidecars, pods). + +## Decision + +Adopt the host-side API server design with the following implementation model, +policy model, and security requirements. The +[`host-side-api-server` experiment](https://github.com/fullsend-ai/experiments/pull/28) +validated the end-to-end flow; the implementation model was refined during +review. + +**Process contract.** Every compiled API server binary accepts `--port`, +`--token`, and `--bind-address` flags; serves `GET /healthz` +(unauthenticated) and `GET /tools.json` (structured tool-use schema) for agent +discovery; validates bearer tokens on all other endpoints; and shuts down +cleanly on `SIGTERM`. Servers write logs to stderr; the runner collects and +bundles logs from all API servers so they are available for inspection after +the run completes. The runner starts servers after pre-script, health-checks +before sandbox creation, and tears down after sandbox destruction. If a server +crashes mid-run, the run fails. + +**Implementation model.** The runner enforces the process contract above — any +binary that satisfies it is a valid API server. Internally, fullsend-maintained +servers use a Go interface as the recommended pattern: each server provides +HTTP handlers and a `/tools.json` schema, and the `main()` wires the +implementation to the process contract (CLI flags, health endpoints, auth +middleware, signal handling). Harnesses reference a compiled Go binary, the +workflow runner downloads it to the host, and `fullsend run` manages its +lifecycle. The interface makes servers testable via stub implementations and +provides the seam for adapting deployment topology across workflow platforms +(e.g., sidecars or separate pods on Kubernetes/OpenShift). Implementors who +need more control can write their own `main()` against the process contract +directly. + +**Network policy via composable provider profiles.** Each API server ships +atomic capability profiles — one per logical group of endpoints (e.g., +`builder-build`, `builder-push`, `builder-read`). Harnesses list which profiles +to attach. Composition is additive per OpenShell's provider-backed policy +composition +([NVIDIA/OpenShell#1037](https://github.com/NVIDIA/OpenShell/pull/1037)). +Different agent roles compose different capability sets for the same server. +Requires OpenShell >= v0.0.37 and +[#776](https://github.com/fullsend-ai/fullsend/issues/776). + +**Per-run auth:** UUID bearer token via OpenShell provider placeholders. JWTs +are a future enhancement when per-operation claims become necessary. + +**File transfer:** `openshell sandbox upload/download` from the server during +request handling. Shared host mount +([NVIDIA/OpenShell#1509](https://github.com/NVIDIA/OpenShell/issues/1509)) +will be evaluated as an alternative when available. + +**Bind address:** servers default to `127.0.0.1`, runner overrides to +`0.0.0.0` for sandboxed agents. +[NVIDIA/OpenShell#1633](https://github.com/NVIDIA/OpenShell/issues/1633) +(supervisor-proxied host-local endpoints) would eliminate this requirement. + +**Security hardening:** timing-safe token comparison, 1 MB request body limits, +rate limiting on unauthenticated endpoints, credential scrubbing in error +messages, bounded in-memory state. + +## Consequences + +- The `api_servers` harness field ([ADR 0024](0024-harness-definitions.md)) will gain a `providers` sub-field and + defined runtime behavior. The initial implementation targets Go servers + behind an internal interface; the process contract keeps the door open for + other languages. +- Implementing this design requires + [#776](https://github.com/fullsend-ai/fullsend/issues/776) (provider-backed + policy composition) as a prerequisite. +- Servers are coupled to the OpenShell CLI for file transfer until shared host + mounts are universally available. +- Servers must bind to `0.0.0.0` on shared hosts, widening the attack surface + until [NVIDIA/OpenShell#1633](https://github.com/NVIDIA/OpenShell/issues/1633) + ships. +- API servers (Tier 3) are now clearly scoped to cases providers cannot + handle: long-running operations, sandbox capability gaps, credentials in + request bodies, response transformation, and multi-step atomic operations. +- Fullsend-maintained servers follow a Go interface pattern (testable, + platform-portable). The process contract remains the enforcement boundary + — servers in other languages or with custom `main()` implementations are + valid as long as they satisfy it. diff --git a/docs/architecture.md b/docs/architecture.md index 71354f75b..7334a01c7 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -117,7 +117,8 @@ Identity is not the same as trust. An agent's identity lets it authenticate to e **Decided:** -- Credential delivery model: four tiers — (1) prefetch + post-process for agents with enumerable inputs (zero credential access), (2) OpenShell providers + L7 egress policies for static token auth (credentials never enter sandbox), (3) host-side REST server for request-body credential injection or response transformation, (4) host files + L7 policies for complex auth requiring in-sandbox credential files. L7 policies enforce both method + path and binary-level restrictions. Providers are preferred over REST servers when viable ([ADR 0017](ADRs/0017-credential-isolation-for-sandboxed-agents.md), extended by [ADR 0025](ADRs/0025-provider-credential-delivery-for-sandboxed-agents.md)). +- Credential delivery model: four tiers — (1) prefetch + post-process for agents with enumerable inputs (zero credential access), (2) OpenShell providers + L7 egress policies for static token auth (credentials never enter sandbox), (3) host-side REST server for operations providers cannot handle — long-running operations, sandbox capability gaps, credentials in request bodies, response transformation, and multi-step atomic operations (see [ADR 0046](ADRs/0046-host-side-api-server-design.md)), (4) host files + L7 policies for complex auth requiring in-sandbox credential files. L7 policies enforce both method + path and binary-level restrictions. Providers are preferred over REST servers when viable ([ADR 0017](ADRs/0017-credential-isolation-for-sandboxed-agents.md), extended by [ADR 0025](ADRs/0025-provider-credential-delivery-for-sandboxed-agents.md)). +- Host-side API server design: Tier 3 servers follow a uniform process contract (`--port`, `--token`, `--bind-address`, `/healthz`, `/tools.json`, `SIGTERM`). Network access is controlled via composable provider profiles — atomic capability profiles composed per-harness. Per-run UUID bearer tokens are delivered through OpenShell provider placeholders. File transfer uses `openshell sandbox upload/download` ([ADR 0046](ADRs/0046-host-side-api-server-design.md)). - Per-role GitHub Apps with manifest-based creation. Each agent role gets its own app with scoped permissions. PEMs stored in Secret Manager as `fullsend-{role}-app-pem` — one secret per role, shared across orgs on a mint. Org isolation is enforced via `ALLOWED_ORGS`, `ROLE_APP_IDS`, and installation verification ([ADR 0007](ADRs/0007-per-role-github-apps.md), [ADR 0033](ADRs/0033-per-repo-installation-mode.md)). One concrete implementation option is [`oidcx`](https://github.com/oxidecomputer/oidcx): a service that accepts OIDC identity tokens and exchanges them for short-lived access tokens. It can mint tokens scoped to selected GitHub repositories and permissions, or to selected Oxide silos and permissions, and it also ships with a GitHub Action wrapper. In a Fullsend deployment, this can be used by the sandbox entrypoint to narrow a broad GitHub App identity down to only the specific permissions an agent needs for the current run.