From fc5ae355773f63587bbf74658c4a1a95407a1609 Mon Sep 17 00:00:00 2001 From: Marta Anon Date: Tue, 2 Jun 2026 22:02:07 +0200 Subject: [PATCH 1/3] docs: add design spec for host-side API server ADR Covers server process contract, API discoverability, composable provider profiles, per-run authentication, credential delivery, file transfer, provider vs API server decision framework, bind address security, and hardening requirements. Signed-off-by: Marta Anon --- .../2026-06-02-host-side-api-server-design.md | 434 ++++++++++++++++++ 1 file changed, 434 insertions(+) create mode 100644 docs/superpowers/specs/2026-06-02-host-side-api-server-design.md diff --git a/docs/superpowers/specs/2026-06-02-host-side-api-server-design.md b/docs/superpowers/specs/2026-06-02-host-side-api-server-design.md new file mode 100644 index 000000000..216910622 --- /dev/null +++ b/docs/superpowers/specs/2026-06-02-host-side-api-server-design.md @@ -0,0 +1,434 @@ +# ADR 0043 — Host-Side API Server Design — Spec + +Tracking issue: [#880](https://github.com/fullsend-ai/fullsend/issues/880) +Parent issue: [#879](https://github.com/fullsend-ai/fullsend/issues/879) +Experiment: `experiments/host-side-api-server` ([fullsend-ai/experiments#28](https://github.com/fullsend-ai/experiments/pull/28)) + +## Purpose + +Define the complete design for host-side API servers that run outside the +OpenShell sandbox and are callable by the agent via HTTP. ADR 0024 introduced +the `api_servers` harness field as PLANNED, and ADRs 0017/0025 established the +host-side REST server as Tier 3 of the credential delivery model. This ADR +fills the remaining design gaps before implementation (#881). + +## Decisions to record + +### 1. Server process contract and lifecycle + +**Context.** The experiment validated a uniform process contract across two +servers in different languages (Go, Python). The runner needs a +language-agnostic interface to manage arbitrary API servers. + +**Decision.** Every host-side API server managed by the runner must: + +- Accept `--port ` and `--token ` CLI flags +- Accept `--bind-address ` (default `127.0.0.1`, see §8 for why the + runner overrides to `0.0.0.0` today) +- Serve `GET /healthz` returning `{"status": "ok"}` when ready (unauthenticated) +- Serve `GET /tools.json` for agent discovery (see §2) +- Validate `Authorization: Bearer ` on all non-health, non-discovery + endpoints +- Shut down cleanly on `SIGTERM` (5s grace period, then `SIGKILL`) + +**Runner lifecycle:** + +1. Start declared API servers after pre-script, before sandbox creation +2. Poll `GET /healthz` until 200 (timeout: 15s, 500ms interval) +3. Configure sandbox network policy and provider credentials +4. Create sandbox and start agent +5. On exit (success or failure): send `SIGTERM` to servers, wait grace period, + `SIGKILL` if needed — after sandbox destruction (step 11 of ADR 0024) + +**Crash behavior.** If an API server crashes mid-run, the run fails. No restart +logic. API servers are critical infrastructure — a crashed server means the +agent has lost access to capabilities it depends on, and continuing would +produce incomplete or incorrect results. + +**Harness schema.** Uses the existing `api_servers` field from ADR 0024, +keeping the `script` field name: + +```yaml +api_servers: + - name: builder + script: api-servers/builder/bin/builder-server + port: 9090 + providers: # NEW — see §3 + - builder-build + - builder-push + env: + REGISTRY_TOKEN: ${REGISTRY_TOKEN} +``` + +### 2. API discoverability for the agent + +**Context.** The experiment compared three approaches for making the API known +to the agent. Each was tested under full-access and restricted L7 policies. + +**Options:** + +| | `/tools.json` | `/openapi.json` | Baked instructions | +|---|---|---|---| +| Token efficiency (full access) | 92k (best) | 107k | 100k | +| Token efficiency (restricted) | 205k (fails) | 534k (fails) | 84k (succeeds) | +| Resilience to blocked discovery | Fails — agent guesses paths | Fails — agent guesses paths | Succeeds — knows paths from skill | +| Maintainability | Single source of truth in server | Single source of truth in server | Skill can drift from API | +| Agent parsing | Structured JSON, minimal ambiguity | Verbose nested structure, more context tokens | Prose, more interpretation needed | + +**Decision.** Require `GET /tools.json` as the standard discovery endpoint. +Each entry contains `name`, `description`, `endpoint`, `method`, and +`input_schema`: + +```json +[ + { + "name": "build_container", + "description": "Build a container image using podman or docker", + "endpoint": "/build", + "method": "POST", + "input_schema": { + "type": "object", + "required": ["tag"], + "properties": { + "tag": {"type": "string", "description": "Image tag"}, + "dockerfile": {"type": "string", "default": "Dockerfile"} + } + } + } +] +``` + +**Rationale.** + +- `/tools.json` is the most token-efficient under full access (92k vs 100k for + baked, 107k for OpenAPI). +- It returns structured data purpose-built for agent consumption — agents parse + JSON directly rather than interpreting Markdown prose or navigating verbose + OpenAPI nesting. +- The schema is a single source of truth in the server. When the API changes, + the agent automatically discovers the new schema. +- OpenAPI is designed for code generators and documentation tools — its nested + structure adds context tokens without proportional benefit for LLM agents. +- Baked instructions are the most resilient to restricted policies (84k, only + method that succeeds), but the discovery endpoint should not be blocked in + normal operation — it is part of the required process contract. + +### 3. Network policy via composable provider profiles + +**Context.** The experiment manually authored L7 policies and hit bugs from +mismatches between server capabilities and policy rules. OpenShell now supports +provider-backed policy composition +([NVIDIA/OpenShell#947](https://github.com/NVIDIA/OpenShell/issues/947), +[NVIDIA/OpenShell#1037](https://github.com/NVIDIA/OpenShell/pull/1037)) where +attaching a provider to a sandbox auto-injects L7 rules as Layer 2 policy +entries. Fullsend issue +[#776](https://github.com/fullsend-ai/fullsend/issues/776) tracks adopting +this for harness policies. + +**How OpenShell policy composition works.** OpenShell has a 3-layer policy +stack: + +- **Layer 1 (Base):** Filesystem, process, landlock — static sandbox config. +- **Layer 2 (Provider):** Auto-generated from attached providers. Each attached + provider contributes network policy rules under a reserved `_provider_*` key. +- **Layer 3 (User):** Explicit user-authored rules via `openshell policy set`. + +A **provider** bundles three things: credentials (with injection style), +endpoints (L7 rules), and binaries (which executables can use those endpoints). +A **provider profile** is the template that defines a provider type — the YAML +schema declaring the endpoints, binaries, and auth configuration. When a +provider is attached to a sandbox, OpenShell auto-injects its endpoint rules +into the effective policy. + +Composition is **additive**: the proxy permits a request if it matches ANY rule +across layers (union of allows). Deny rules win globally — if any rule denies +a request, it's blocked regardless of allows in other rules. + +**Options:** + +**Option A: Composable provider profiles per capability (recommended).** Each +API server ships atomic provider profiles — one per logical group of endpoints. +Each profile bundles the credential injection (`auth` block), the L7 endpoint +rules, and the binary restrictions as a single `(credential, endpoint, binary)` +unit. Harnesses list which profiles to attach. Different agent roles compose +different capability sets for the same server. + +```yaml +# Provider profiles (defined once per API server, registered on gateway): +# +# builder-build profile: +# endpoints: POST /build, GET /healthz, GET /tools.json +# binaries: **/curl +# auth: bearer token injection +# +# builder-push profile: +# endpoints: POST /push +# binaries: **/curl +# auth: bearer token injection +# +# builder-read profile: +# endpoints: GET /images +# binaries: **/curl +# auth: bearer token injection + +# Code agent harness — full access: +api_servers: + - name: builder + script: api-servers/builder/bin/builder-server + port: 9090 + providers: + - builder-build + - builder-push + - builder-read + +# Review agent harness — read-only: +api_servers: + - name: builder + script: api-servers/builder/bin/builder-server + port: 9090 + providers: + - builder-read +``` + +Pros: reusable across harnesses, follows least privilege naturally (compose +only what the agent needs), aligns with OpenShell's provider model (credential ++ endpoint + binary as a unit), each profile is defined once and composed +freely. Cons: requires creating and registering custom provider profiles for +each API server capability, depends on OpenShell >= v0.0.37 and fullsend #776. + +**Option B: Runner-generated monolithic policy.** The runner generates a single +L7 policy file from an `allowed_paths` list in the harness `api_servers` entry. +No dependency on OpenShell provider profiles. + +```yaml +api_servers: + - name: builder + script: api-servers/builder/bin/builder-server + port: 9090 + allowed_paths: + - method: POST + path: /build + - method: GET + path: /images +``` + +Pros: works with any OpenShell version, no external dependencies, simpler to +implement. Cons: duplicates policy logic across harnesses, error-prone +(experiment hit bugs from mismatches between server API surface and manually +authored policy), no reuse of capability definitions, credential injection must +be handled separately. + +**Decision.** Option A — composable provider profiles per capability. + +**Requirements:** + +- OpenShell >= v0.0.37 (profile-backed policy composition, + [NVIDIA/OpenShell#1037](https://github.com/NVIDIA/OpenShell/pull/1037)) +- The `use_providers_v2` gateway setting may be required (see + [#776](https://github.com/fullsend-ai/fullsend/issues/776)) +- Prerequisite: [fullsend-ai/fullsend#776](https://github.com/fullsend-ai/fullsend/issues/776) + (adopt provider-backed policy composition) + +**Composition semantics.** Composition is additive-only: provider rules and +user rules live in separate keys, and the proxy permits a request if it matches +any rule. There is no cross-rule deny mechanism that would let a user policy +narrow what a provider profile grants (though provider deny rules do block +globally). Different access levels for the same server are achieved by +composing different profile sets, not by adding deny overrides. + +### 4. Per-run authentication + +**Context.** The agent inside the sandbox needs to authenticate to API servers. +The real credential must never enter the sandbox. + +**Options:** + +**Option A: UUID bearer token via provider placeholders (recommended).** The +runner generates a random UUID token per run. The token is registered as an +OpenShell provider credential with an `auth: bearer` declaration. The L7 proxy +resolves the placeholder to the real token in outgoing `Authorization` headers +— the real token never enters the sandbox. + +```yaml +# Provider definition +name: api-server +type: generic +credentials: + API_TOKEN: ${API_TOKEN} +``` + +Pros: simple, proven by experiment, no key management, credential never enters +sandbox, credential scoping ensures the token is only injected for requests +matching the provider's endpoints and binaries. Cons: no claims or expiry — +the token is valid for the entire run and grants whatever endpoints the L7 +policy allows. + +**Option B: Short-lived JWTs with claims.** The runner generates a JWT signed +with a per-run key pair. Claims include run ID, repo, and allowed operations. +Servers validate the signature and claims. The JWT can be short-lived (e.g., 1 +hour) with refresh. + +Pros: per-operation authorization, expiry, audit trail via claims. Cons: adds +signing key management, JWT library dependency in every server, more complex +token lifecycle, and the L7 policy already restricts which endpoints are +reachable — JWT claims would be a second layer of the same restriction. + +**Decision.** Option A — UUID bearer token via provider placeholders. The L7 +policy already enforces which endpoints each agent can reach (§3), making +per-operation JWT claims redundant for the initial design. JWT-based auth is a +future enhancement for when per-operation claims become necessary (e.g., +multi-tenant servers, cross-run audit). + +**Security requirement.** Token comparison must be timing-safe +(`crypto/subtle.ConstantTimeCompare` in Go, `hmac.compare_digest` in Python). +The experiment code flagged this as a TODO. + +### 5. Credential delivery to the server process + +**Context.** API servers hold credentials on behalf of the agent (registry +tokens, GitHub tokens, API keys). These must reach the server without passing +through the sandbox. + +**Decision.** Credentials are delivered via environment variables expanded from +the host environment at server startup. The `env` field in `api_servers` (ADR +0024) supports `${HOST_VAR}` syntax: + +```yaml +api_servers: + - name: builder + script: api-servers/builder/bin/builder-server + port: 9090 + env: + REGISTRY_TOKEN: ${REGISTRY_TOKEN} + GCP_KEY_PATH: ${GOOGLE_APPLICATION_CREDENTIALS} +``` + +The per-run bearer token is passed via `--token` CLI flag (not through `env`, +since it's part of the process contract). + +No secrets mounts or vault integration in the initial design. The runner +expands `${VAR}` references against its own environment and passes the resolved +values to the server process. Sensitive values must not appear in logs or error +messages — servers must scrub credentials from error output (the experiment's +provisioner implements `_scrub_credentials` for this). + +### 6. File transfer between server and sandbox + +**Context.** API servers that build artifacts or provision repos need to +exchange files with the sandbox. File transfer must happen during request +handling — the agent calls the API, the server produces or consumes files, and +the result must be in the sandbox before the response returns. The runner is +not in this loop. + +**Options:** + +**Option A: `openshell sandbox upload/download` from the server (recommended).** +The server shells out to the OpenShell CLI to transfer files during request +handling. The agent passes its sandbox name per-request (discovered via +`hostname | sed 's/^sandbox-//'`), and the server uses it with `openshell +sandbox download ` and `openshell sandbox +upload `. + +Pros: works today, validated by experiment, handles real-time exchange +naturally (transfer happens during request handling), no runner mediation +needed. Cons: couples server to OpenShell CLI — servers need `openshell` on +`PATH` and can't be tested without it. + +**Option B: Shared host mount.** The runner creates a staging directory on the +host and mounts it into the sandbox via `openshell sandbox create --mount +:`. Both the server and the agent see the same +directory — no explicit transfer needed. + +Pros: no transfer commands, transparent POSIX access, no OpenShell CLI +dependency in the server. Cons: depends on OpenShell mount support (available +on K3s via +[NVIDIA/OpenShell#500](https://github.com/NVIDIA/OpenShell/issues/500), pending +for VM driver via +[NVIDIA/OpenShell#1509](https://github.com/NVIDIA/OpenShell/issues/1509)), +bidirectional mounts may introduce TOCTOU risks if both sides write +concurrently. + +**Option C: HTTP multipart via the API.** The agent uploads files to the server +and downloads results through the server's own REST endpoints using multipart +form data over the L7 proxy. The server stores files on the host side. + +Pros: fully portable, standard HTTP, server is self-contained, no OpenShell +dependency. Cons: large files go through the L7 proxy (bandwidth/latency +overhead), requires multipart handling in every server, the proxy must allow +the content-type and body size. + +**Decision.** Option A — `openshell sandbox upload/download` from the server. +These servers are purpose-built for the OpenShell environment, so the CLI +coupling is acceptable. The experiment validated this end-to-end with both +the builder (download context, build, upload tarball) and the provisioner +(clone, scan, upload repo). + +Option B (shared mount) is noted as the preferred future direction when +OpenShell mount support is universally available — it eliminates transfer +commands entirely and makes file exchange transparent. + +### 7. Provider vs. API server decision framework + +**Context.** Issue #196 evaluated providers as a replacement for REST servers. +Providers (Tier 2) became the preferred path, but API servers (Tier 3) remain +necessary for cases providers cannot handle. + +**Decision.** Use providers (Tier 2) by default. Use API servers (Tier 3) when +any of the following apply: + +| Condition | Why providers can't handle it | +|---|---| +| **Long-running operations** (> 60s) | MCP client timeouts (~30-60s) make provider-based tools unsuitable ([claude-code#7575](https://github.com/anthropics/claude-code/issues/7575)) | +| **Sandbox capability gaps** | Operations the sandbox deliberately blocks (e.g., container builds — seccomp blocks `CLONE_NEWUSER`, `AF_NETLINK`, `setns`; agent has zero Linux capabilities; [NVIDIA/OpenShell#113](https://github.com/NVIDIA/OpenShell/issues/113)) | +| **Credentials in request bodies** | Provider placeholder model only intercepts `Authorization` headers; credentials embedded in JSON bodies or query parameters require server-side injection | +| **Response transformation** | Scanning, filtering, or transforming responses before they reach the agent (e.g., repo provisioner's scan-before-copy) | +| **Multi-step atomic operations** | Operations that combine multiple steps as a single unit (clone + scan + copy) where partial completion would be worse than failure | + +### 8. Bind address and network exposure + +**Context.** The experiment found that API servers must bind to `0.0.0.0` +because the L7 proxy connects from inside the container network namespace — +servers bound to `127.0.0.1` are unreachable. On rootless Podman, the container +bridge gateway IP (e.g., `10.88.0.1`) lives inside the container namespace and +cannot be bound from the host (`EADDRNOTAVAIL`). + +**Decision.** Servers default to `--bind-address 127.0.0.1` (secure by +default). The runner explicitly passes `--bind-address 0.0.0.0` when starting +servers for sandboxed agents. This is a security trade-off: on shared hosts, +other processes can probe the server ports. Bearer token authentication +mitigates this but doesn't eliminate the attack surface. + +**Future.** [NVIDIA/OpenShell#1633](https://github.com/NVIDIA/OpenShell/issues/1633) +proposes generalizing the `inference.local` supervisor proxy pattern for +arbitrary host services. If implemented, the supervisor would proxy connections +from inside the sandbox to `127.0.0.1` on the host, eliminating the need for +`0.0.0.0` binding entirely. Servers would never be network-exposed. The runner +should detect OpenShell support for this and stop passing +`--bind-address 0.0.0.0` when available. + +**Network policy.** All URLs delivered into the sandbox must use +`host.openshell.internal`, never raw IPs. The L7 proxy matches requests by +hostname, and SSRF protection blocks private IP addresses. The `allowed_ips` +field in network policies handles SSRF allowlisting separately using the host +IP rendered into policy templates at runtime. + +### 9. Security hardening requirements + +Based on experiment findings and code review of the PoC servers: + +| Requirement | Rationale | +|---|---| +| **Timing-safe token comparison** | Naive string comparison leaks token length via timing side-channel. Use `crypto/subtle.ConstantTimeCompare` (Go) or `hmac.compare_digest` (Python). | +| **Request body size limits** | Prevent DoS via oversized payloads. Recommend 1 MB default (`http.MaxBytesReader` in Go, `Content-Length` check in Python). | +| **Rate limiting on unauthenticated endpoints** | `/healthz`, `/tools.json` are reachable without a bearer token. Rate limit to prevent abuse, especially when bound to `0.0.0.0`. | +| **Credential scrubbing in error messages** | Error responses must not leak credentials embedded in URLs or environment variables. Servers must scrub before returning errors to the agent. | +| **Bounded in-memory state** | Servers that track operation state (e.g., provisioner's job map) must bound the state or expire old entries. Unbounded growth is acceptable only for short-lived experiment runs. | + +### 10. Relationship to existing ADRs + +| ADR | Relationship | +|---|---| +| **0016** (Unidirectional control flow) | Preserved — API servers are provisioned top-down from the harness. Agents cannot request servers to be started. The runner manages the full lifecycle. | +| **0017** (Credential isolation) | Implemented — this ADR specifies the concrete process contract for the host-side REST server model. Per-run bearer tokens via provider placeholders fulfill the "credentials never enter the sandbox" requirement. | +| **0024** (Harness definitions) | Extended — this ADR specifies runtime behavior for the `api_servers` field, adds `providers` sub-field for composable policy profiles, and inserts API server lifecycle into the execution sequence (after pre-script, before sandbox creation). | +| **0025** (Provider-based credential delivery) | Tier 3 (host-side REST server) is now fully specified. The decision framework in §7 defines when to use Tier 3 vs. Tier 2 (providers). | From f138206dce6cd321f8afb41b4486bcf6d86ad817 Mon Sep 17 00:00:00 2001 From: Marta Anon Date: Tue, 2 Jun 2026 22:42:33 +0200 Subject: [PATCH 2/3] =?UTF-8?q?docs:=20add=20ADR=200046=20=E2=80=94=20host?= =?UTF-8?q?-side=20API=20server=20design=20for=20sandboxed=20agents?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Defines the process contract, composable provider profiles for network policy, per-run auth via UUID bearer tokens, file transfer via openshell CLI, and security hardening requirements. Updates architecture.md with the Tier 3 design summary. Signed-off-by: Marta Anon --- docs/ADRs/0046-host-side-api-server-design.md | 137 ++++++ docs/architecture.md | 3 +- .../2026-06-02-host-side-api-server-design.md | 434 ------------------ 3 files changed, 139 insertions(+), 435 deletions(-) create mode 100644 docs/ADRs/0046-host-side-api-server-design.md delete mode 100644 docs/superpowers/specs/2026-06-02-host-side-api-server-design.md diff --git a/docs/ADRs/0046-host-side-api-server-design.md b/docs/ADRs/0046-host-side-api-server-design.md new file mode 100644 index 000000000..ed6c31468 --- /dev/null +++ b/docs/ADRs/0046-host-side-api-server-design.md @@ -0,0 +1,137 @@ +--- +title: "46. Host-side API server design for sandboxed agents" +status: Accepted +relates_to: + - agent-infrastructure + - security-threat-model +topics: + - sandbox + - api-server + - credential-isolation +--- + +# 46. Host-side API server design for sandboxed agents + +Date: 2026-06-02 + +## Status + +Accepted + + + +## Context + +[ADR 0024](0024-harness-definitions.md) introduced the `api_servers` harness field as planned but not +implemented. [ADR 0017](0017-credential-isolation-for-sandboxed-agents.md)/[ADR 0025](0025-provider-credential-delivery-for-sandboxed-agents.md) established the host-side REST server as Tier 3 +of the credential delivery model — for cases where providers (Tier 2) cannot +handle: long-running operations exceeding MCP timeouts, operations the sandbox +deliberately blocks (container builds, see +[NVIDIA/OpenShell#113](https://github.com/NVIDIA/OpenShell/issues/113)), +credentials in request bodies, response transformation, and multi-step atomic +operations. + +The `host-side-api-server` experiment +([fullsend-ai/experiments#28](https://github.com/fullsend-ai/experiments/pull/28)) +validated the end-to-end flow with two servers (Go container builder, Python +repo provisioner), testing lifecycle management, API discoverability, L7 policy +tuning, per-run auth, and file transfer. This ADR records the design decisions +informed by that experiment. + +## Options + +### API discoverability + +Three approaches were tested. `/tools.json` (structured tool-use schema) was +the most token-efficient under full access (92k tokens vs 107k for OpenAPI, +100k for baked instructions). Both discovery-based methods fail under +restricted policies where the endpoint is blocked; baked instructions succeed +(84k) but can drift from the actual API. OpenAPI's verbose structure adds +context tokens without proportional benefit for LLM agents. + +### Per-run authentication + +**UUID bearer token via provider placeholders:** simple, proven, no key +management. The proxy resolves the placeholder — the real token never enters +the sandbox. No claims or expiry. + +**Short-lived JWTs with claims:** per-operation authorization and audit trail, +but adds signing key management and JWT library dependencies in every server. +The L7 policy already restricts reachable endpoints, making JWT claims a +redundant second layer. + +### File transfer between server and sandbox + +**`openshell sandbox upload/download` from the server:** works today, validated +by experiment, handles real-time exchange during request handling. Couples +server to OpenShell CLI. + +**Shared host mount:** transparent POSIX access, no CLI coupling. Depends on +OpenShell mount support that is not yet universally available +([NVIDIA/OpenShell#1509](https://github.com/NVIDIA/OpenShell/issues/1509)). + +**HTTP multipart via the API:** fully portable, but large files through the L7 +proxy add overhead. + +## Decision + +Adopt the host-side API server design with the following process contract, +policy model, and security requirements. Full design details in the +[`host-side-api-server` experiment](https://github.com/fullsend-ai/experiments/pull/28). + +**Process contract.** Every host-side API server must accept `--port`, +`--token`, and `--bind-address` CLI flags; serve `GET /healthz` +(unauthenticated) and `GET /tools.json` (structured tool-use schema) for agent +discovery; validate bearer tokens on all other endpoints; and shut down cleanly +on `SIGTERM`. Servers must write logs to stderr; the runner collects and +bundles logs from all API servers so they are available for inspection after +the run completes. The runner starts servers after pre-script, health-checks +before sandbox creation, and tears down after sandbox destruction. If a server +crashes mid-run, the run fails. + +**Network policy via composable provider profiles.** Each API server ships +atomic capability profiles — one per logical group of endpoints (e.g., +`builder-build`, `builder-push`, `builder-read`). Harnesses list which profiles +to attach. Composition is additive per OpenShell's provider-backed policy +composition +([NVIDIA/OpenShell#1037](https://github.com/NVIDIA/OpenShell/pull/1037)). +Different agent roles compose different capability sets for the same server. +Requires OpenShell >= v0.0.37 and +[#776](https://github.com/fullsend-ai/fullsend/issues/776). + +**Per-run auth:** UUID bearer token via OpenShell provider placeholders. JWTs +are a future enhancement when per-operation claims become necessary. + +**File transfer:** `openshell sandbox upload/download` from the server during +request handling. Shared host mount +([NVIDIA/OpenShell#1509](https://github.com/NVIDIA/OpenShell/issues/1509)) +will be evaluated as an alternative when available. + +**Bind address:** servers default to `127.0.0.1`, runner overrides to +`0.0.0.0` for sandboxed agents. +[NVIDIA/OpenShell#1633](https://github.com/NVIDIA/OpenShell/issues/1633) +(supervisor-proxied host-local endpoints) would eliminate this requirement. + +**Security hardening:** timing-safe token comparison, 1 MB request body limits, +rate limiting on unauthenticated endpoints, credential scrubbing in error +messages, bounded in-memory state. + +## Consequences + +- The `api_servers` harness field ([ADR 0024](0024-harness-definitions.md)) will gain a `providers` sub-field and + defined runtime behavior — servers can be implemented in any language + following the uniform process contract. +- Implementing this design requires + [#776](https://github.com/fullsend-ai/fullsend/issues/776) (provider-backed + policy composition) as a prerequisite. +- Servers are coupled to the OpenShell CLI for file transfer until shared host + mounts are universally available. +- Servers must bind to `0.0.0.0` on shared hosts, widening the attack surface + until [NVIDIA/OpenShell#1633](https://github.com/NVIDIA/OpenShell/issues/1633) + ships. +- API servers (Tier 3) are now clearly scoped to cases providers cannot + handle: long-running operations, sandbox capability gaps, credentials in + request bodies, response transformation, and multi-step atomic operations. diff --git a/docs/architecture.md b/docs/architecture.md index 71354f75b..7334a01c7 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -117,7 +117,8 @@ Identity is not the same as trust. An agent's identity lets it authenticate to e **Decided:** -- Credential delivery model: four tiers — (1) prefetch + post-process for agents with enumerable inputs (zero credential access), (2) OpenShell providers + L7 egress policies for static token auth (credentials never enter sandbox), (3) host-side REST server for request-body credential injection or response transformation, (4) host files + L7 policies for complex auth requiring in-sandbox credential files. L7 policies enforce both method + path and binary-level restrictions. Providers are preferred over REST servers when viable ([ADR 0017](ADRs/0017-credential-isolation-for-sandboxed-agents.md), extended by [ADR 0025](ADRs/0025-provider-credential-delivery-for-sandboxed-agents.md)). +- Credential delivery model: four tiers — (1) prefetch + post-process for agents with enumerable inputs (zero credential access), (2) OpenShell providers + L7 egress policies for static token auth (credentials never enter sandbox), (3) host-side REST server for operations providers cannot handle — long-running operations, sandbox capability gaps, credentials in request bodies, response transformation, and multi-step atomic operations (see [ADR 0046](ADRs/0046-host-side-api-server-design.md)), (4) host files + L7 policies for complex auth requiring in-sandbox credential files. L7 policies enforce both method + path and binary-level restrictions. Providers are preferred over REST servers when viable ([ADR 0017](ADRs/0017-credential-isolation-for-sandboxed-agents.md), extended by [ADR 0025](ADRs/0025-provider-credential-delivery-for-sandboxed-agents.md)). +- Host-side API server design: Tier 3 servers follow a uniform process contract (`--port`, `--token`, `--bind-address`, `/healthz`, `/tools.json`, `SIGTERM`). Network access is controlled via composable provider profiles — atomic capability profiles composed per-harness. Per-run UUID bearer tokens are delivered through OpenShell provider placeholders. File transfer uses `openshell sandbox upload/download` ([ADR 0046](ADRs/0046-host-side-api-server-design.md)). - Per-role GitHub Apps with manifest-based creation. Each agent role gets its own app with scoped permissions. PEMs stored in Secret Manager as `fullsend-{role}-app-pem` — one secret per role, shared across orgs on a mint. Org isolation is enforced via `ALLOWED_ORGS`, `ROLE_APP_IDS`, and installation verification ([ADR 0007](ADRs/0007-per-role-github-apps.md), [ADR 0033](ADRs/0033-per-repo-installation-mode.md)). One concrete implementation option is [`oidcx`](https://github.com/oxidecomputer/oidcx): a service that accepts OIDC identity tokens and exchanges them for short-lived access tokens. It can mint tokens scoped to selected GitHub repositories and permissions, or to selected Oxide silos and permissions, and it also ships with a GitHub Action wrapper. In a Fullsend deployment, this can be used by the sandbox entrypoint to narrow a broad GitHub App identity down to only the specific permissions an agent needs for the current run. diff --git a/docs/superpowers/specs/2026-06-02-host-side-api-server-design.md b/docs/superpowers/specs/2026-06-02-host-side-api-server-design.md deleted file mode 100644 index 216910622..000000000 --- a/docs/superpowers/specs/2026-06-02-host-side-api-server-design.md +++ /dev/null @@ -1,434 +0,0 @@ -# ADR 0043 — Host-Side API Server Design — Spec - -Tracking issue: [#880](https://github.com/fullsend-ai/fullsend/issues/880) -Parent issue: [#879](https://github.com/fullsend-ai/fullsend/issues/879) -Experiment: `experiments/host-side-api-server` ([fullsend-ai/experiments#28](https://github.com/fullsend-ai/experiments/pull/28)) - -## Purpose - -Define the complete design for host-side API servers that run outside the -OpenShell sandbox and are callable by the agent via HTTP. ADR 0024 introduced -the `api_servers` harness field as PLANNED, and ADRs 0017/0025 established the -host-side REST server as Tier 3 of the credential delivery model. This ADR -fills the remaining design gaps before implementation (#881). - -## Decisions to record - -### 1. Server process contract and lifecycle - -**Context.** The experiment validated a uniform process contract across two -servers in different languages (Go, Python). The runner needs a -language-agnostic interface to manage arbitrary API servers. - -**Decision.** Every host-side API server managed by the runner must: - -- Accept `--port ` and `--token ` CLI flags -- Accept `--bind-address ` (default `127.0.0.1`, see §8 for why the - runner overrides to `0.0.0.0` today) -- Serve `GET /healthz` returning `{"status": "ok"}` when ready (unauthenticated) -- Serve `GET /tools.json` for agent discovery (see §2) -- Validate `Authorization: Bearer ` on all non-health, non-discovery - endpoints -- Shut down cleanly on `SIGTERM` (5s grace period, then `SIGKILL`) - -**Runner lifecycle:** - -1. Start declared API servers after pre-script, before sandbox creation -2. Poll `GET /healthz` until 200 (timeout: 15s, 500ms interval) -3. Configure sandbox network policy and provider credentials -4. Create sandbox and start agent -5. On exit (success or failure): send `SIGTERM` to servers, wait grace period, - `SIGKILL` if needed — after sandbox destruction (step 11 of ADR 0024) - -**Crash behavior.** If an API server crashes mid-run, the run fails. No restart -logic. API servers are critical infrastructure — a crashed server means the -agent has lost access to capabilities it depends on, and continuing would -produce incomplete or incorrect results. - -**Harness schema.** Uses the existing `api_servers` field from ADR 0024, -keeping the `script` field name: - -```yaml -api_servers: - - name: builder - script: api-servers/builder/bin/builder-server - port: 9090 - providers: # NEW — see §3 - - builder-build - - builder-push - env: - REGISTRY_TOKEN: ${REGISTRY_TOKEN} -``` - -### 2. API discoverability for the agent - -**Context.** The experiment compared three approaches for making the API known -to the agent. Each was tested under full-access and restricted L7 policies. - -**Options:** - -| | `/tools.json` | `/openapi.json` | Baked instructions | -|---|---|---|---| -| Token efficiency (full access) | 92k (best) | 107k | 100k | -| Token efficiency (restricted) | 205k (fails) | 534k (fails) | 84k (succeeds) | -| Resilience to blocked discovery | Fails — agent guesses paths | Fails — agent guesses paths | Succeeds — knows paths from skill | -| Maintainability | Single source of truth in server | Single source of truth in server | Skill can drift from API | -| Agent parsing | Structured JSON, minimal ambiguity | Verbose nested structure, more context tokens | Prose, more interpretation needed | - -**Decision.** Require `GET /tools.json` as the standard discovery endpoint. -Each entry contains `name`, `description`, `endpoint`, `method`, and -`input_schema`: - -```json -[ - { - "name": "build_container", - "description": "Build a container image using podman or docker", - "endpoint": "/build", - "method": "POST", - "input_schema": { - "type": "object", - "required": ["tag"], - "properties": { - "tag": {"type": "string", "description": "Image tag"}, - "dockerfile": {"type": "string", "default": "Dockerfile"} - } - } - } -] -``` - -**Rationale.** - -- `/tools.json` is the most token-efficient under full access (92k vs 100k for - baked, 107k for OpenAPI). -- It returns structured data purpose-built for agent consumption — agents parse - JSON directly rather than interpreting Markdown prose or navigating verbose - OpenAPI nesting. -- The schema is a single source of truth in the server. When the API changes, - the agent automatically discovers the new schema. -- OpenAPI is designed for code generators and documentation tools — its nested - structure adds context tokens without proportional benefit for LLM agents. -- Baked instructions are the most resilient to restricted policies (84k, only - method that succeeds), but the discovery endpoint should not be blocked in - normal operation — it is part of the required process contract. - -### 3. Network policy via composable provider profiles - -**Context.** The experiment manually authored L7 policies and hit bugs from -mismatches between server capabilities and policy rules. OpenShell now supports -provider-backed policy composition -([NVIDIA/OpenShell#947](https://github.com/NVIDIA/OpenShell/issues/947), -[NVIDIA/OpenShell#1037](https://github.com/NVIDIA/OpenShell/pull/1037)) where -attaching a provider to a sandbox auto-injects L7 rules as Layer 2 policy -entries. Fullsend issue -[#776](https://github.com/fullsend-ai/fullsend/issues/776) tracks adopting -this for harness policies. - -**How OpenShell policy composition works.** OpenShell has a 3-layer policy -stack: - -- **Layer 1 (Base):** Filesystem, process, landlock — static sandbox config. -- **Layer 2 (Provider):** Auto-generated from attached providers. Each attached - provider contributes network policy rules under a reserved `_provider_*` key. -- **Layer 3 (User):** Explicit user-authored rules via `openshell policy set`. - -A **provider** bundles three things: credentials (with injection style), -endpoints (L7 rules), and binaries (which executables can use those endpoints). -A **provider profile** is the template that defines a provider type — the YAML -schema declaring the endpoints, binaries, and auth configuration. When a -provider is attached to a sandbox, OpenShell auto-injects its endpoint rules -into the effective policy. - -Composition is **additive**: the proxy permits a request if it matches ANY rule -across layers (union of allows). Deny rules win globally — if any rule denies -a request, it's blocked regardless of allows in other rules. - -**Options:** - -**Option A: Composable provider profiles per capability (recommended).** Each -API server ships atomic provider profiles — one per logical group of endpoints. -Each profile bundles the credential injection (`auth` block), the L7 endpoint -rules, and the binary restrictions as a single `(credential, endpoint, binary)` -unit. Harnesses list which profiles to attach. Different agent roles compose -different capability sets for the same server. - -```yaml -# Provider profiles (defined once per API server, registered on gateway): -# -# builder-build profile: -# endpoints: POST /build, GET /healthz, GET /tools.json -# binaries: **/curl -# auth: bearer token injection -# -# builder-push profile: -# endpoints: POST /push -# binaries: **/curl -# auth: bearer token injection -# -# builder-read profile: -# endpoints: GET /images -# binaries: **/curl -# auth: bearer token injection - -# Code agent harness — full access: -api_servers: - - name: builder - script: api-servers/builder/bin/builder-server - port: 9090 - providers: - - builder-build - - builder-push - - builder-read - -# Review agent harness — read-only: -api_servers: - - name: builder - script: api-servers/builder/bin/builder-server - port: 9090 - providers: - - builder-read -``` - -Pros: reusable across harnesses, follows least privilege naturally (compose -only what the agent needs), aligns with OpenShell's provider model (credential -+ endpoint + binary as a unit), each profile is defined once and composed -freely. Cons: requires creating and registering custom provider profiles for -each API server capability, depends on OpenShell >= v0.0.37 and fullsend #776. - -**Option B: Runner-generated monolithic policy.** The runner generates a single -L7 policy file from an `allowed_paths` list in the harness `api_servers` entry. -No dependency on OpenShell provider profiles. - -```yaml -api_servers: - - name: builder - script: api-servers/builder/bin/builder-server - port: 9090 - allowed_paths: - - method: POST - path: /build - - method: GET - path: /images -``` - -Pros: works with any OpenShell version, no external dependencies, simpler to -implement. Cons: duplicates policy logic across harnesses, error-prone -(experiment hit bugs from mismatches between server API surface and manually -authored policy), no reuse of capability definitions, credential injection must -be handled separately. - -**Decision.** Option A — composable provider profiles per capability. - -**Requirements:** - -- OpenShell >= v0.0.37 (profile-backed policy composition, - [NVIDIA/OpenShell#1037](https://github.com/NVIDIA/OpenShell/pull/1037)) -- The `use_providers_v2` gateway setting may be required (see - [#776](https://github.com/fullsend-ai/fullsend/issues/776)) -- Prerequisite: [fullsend-ai/fullsend#776](https://github.com/fullsend-ai/fullsend/issues/776) - (adopt provider-backed policy composition) - -**Composition semantics.** Composition is additive-only: provider rules and -user rules live in separate keys, and the proxy permits a request if it matches -any rule. There is no cross-rule deny mechanism that would let a user policy -narrow what a provider profile grants (though provider deny rules do block -globally). Different access levels for the same server are achieved by -composing different profile sets, not by adding deny overrides. - -### 4. Per-run authentication - -**Context.** The agent inside the sandbox needs to authenticate to API servers. -The real credential must never enter the sandbox. - -**Options:** - -**Option A: UUID bearer token via provider placeholders (recommended).** The -runner generates a random UUID token per run. The token is registered as an -OpenShell provider credential with an `auth: bearer` declaration. The L7 proxy -resolves the placeholder to the real token in outgoing `Authorization` headers -— the real token never enters the sandbox. - -```yaml -# Provider definition -name: api-server -type: generic -credentials: - API_TOKEN: ${API_TOKEN} -``` - -Pros: simple, proven by experiment, no key management, credential never enters -sandbox, credential scoping ensures the token is only injected for requests -matching the provider's endpoints and binaries. Cons: no claims or expiry — -the token is valid for the entire run and grants whatever endpoints the L7 -policy allows. - -**Option B: Short-lived JWTs with claims.** The runner generates a JWT signed -with a per-run key pair. Claims include run ID, repo, and allowed operations. -Servers validate the signature and claims. The JWT can be short-lived (e.g., 1 -hour) with refresh. - -Pros: per-operation authorization, expiry, audit trail via claims. Cons: adds -signing key management, JWT library dependency in every server, more complex -token lifecycle, and the L7 policy already restricts which endpoints are -reachable — JWT claims would be a second layer of the same restriction. - -**Decision.** Option A — UUID bearer token via provider placeholders. The L7 -policy already enforces which endpoints each agent can reach (§3), making -per-operation JWT claims redundant for the initial design. JWT-based auth is a -future enhancement for when per-operation claims become necessary (e.g., -multi-tenant servers, cross-run audit). - -**Security requirement.** Token comparison must be timing-safe -(`crypto/subtle.ConstantTimeCompare` in Go, `hmac.compare_digest` in Python). -The experiment code flagged this as a TODO. - -### 5. Credential delivery to the server process - -**Context.** API servers hold credentials on behalf of the agent (registry -tokens, GitHub tokens, API keys). These must reach the server without passing -through the sandbox. - -**Decision.** Credentials are delivered via environment variables expanded from -the host environment at server startup. The `env` field in `api_servers` (ADR -0024) supports `${HOST_VAR}` syntax: - -```yaml -api_servers: - - name: builder - script: api-servers/builder/bin/builder-server - port: 9090 - env: - REGISTRY_TOKEN: ${REGISTRY_TOKEN} - GCP_KEY_PATH: ${GOOGLE_APPLICATION_CREDENTIALS} -``` - -The per-run bearer token is passed via `--token` CLI flag (not through `env`, -since it's part of the process contract). - -No secrets mounts or vault integration in the initial design. The runner -expands `${VAR}` references against its own environment and passes the resolved -values to the server process. Sensitive values must not appear in logs or error -messages — servers must scrub credentials from error output (the experiment's -provisioner implements `_scrub_credentials` for this). - -### 6. File transfer between server and sandbox - -**Context.** API servers that build artifacts or provision repos need to -exchange files with the sandbox. File transfer must happen during request -handling — the agent calls the API, the server produces or consumes files, and -the result must be in the sandbox before the response returns. The runner is -not in this loop. - -**Options:** - -**Option A: `openshell sandbox upload/download` from the server (recommended).** -The server shells out to the OpenShell CLI to transfer files during request -handling. The agent passes its sandbox name per-request (discovered via -`hostname | sed 's/^sandbox-//'`), and the server uses it with `openshell -sandbox download ` and `openshell sandbox -upload `. - -Pros: works today, validated by experiment, handles real-time exchange -naturally (transfer happens during request handling), no runner mediation -needed. Cons: couples server to OpenShell CLI — servers need `openshell` on -`PATH` and can't be tested without it. - -**Option B: Shared host mount.** The runner creates a staging directory on the -host and mounts it into the sandbox via `openshell sandbox create --mount -:`. Both the server and the agent see the same -directory — no explicit transfer needed. - -Pros: no transfer commands, transparent POSIX access, no OpenShell CLI -dependency in the server. Cons: depends on OpenShell mount support (available -on K3s via -[NVIDIA/OpenShell#500](https://github.com/NVIDIA/OpenShell/issues/500), pending -for VM driver via -[NVIDIA/OpenShell#1509](https://github.com/NVIDIA/OpenShell/issues/1509)), -bidirectional mounts may introduce TOCTOU risks if both sides write -concurrently. - -**Option C: HTTP multipart via the API.** The agent uploads files to the server -and downloads results through the server's own REST endpoints using multipart -form data over the L7 proxy. The server stores files on the host side. - -Pros: fully portable, standard HTTP, server is self-contained, no OpenShell -dependency. Cons: large files go through the L7 proxy (bandwidth/latency -overhead), requires multipart handling in every server, the proxy must allow -the content-type and body size. - -**Decision.** Option A — `openshell sandbox upload/download` from the server. -These servers are purpose-built for the OpenShell environment, so the CLI -coupling is acceptable. The experiment validated this end-to-end with both -the builder (download context, build, upload tarball) and the provisioner -(clone, scan, upload repo). - -Option B (shared mount) is noted as the preferred future direction when -OpenShell mount support is universally available — it eliminates transfer -commands entirely and makes file exchange transparent. - -### 7. Provider vs. API server decision framework - -**Context.** Issue #196 evaluated providers as a replacement for REST servers. -Providers (Tier 2) became the preferred path, but API servers (Tier 3) remain -necessary for cases providers cannot handle. - -**Decision.** Use providers (Tier 2) by default. Use API servers (Tier 3) when -any of the following apply: - -| Condition | Why providers can't handle it | -|---|---| -| **Long-running operations** (> 60s) | MCP client timeouts (~30-60s) make provider-based tools unsuitable ([claude-code#7575](https://github.com/anthropics/claude-code/issues/7575)) | -| **Sandbox capability gaps** | Operations the sandbox deliberately blocks (e.g., container builds — seccomp blocks `CLONE_NEWUSER`, `AF_NETLINK`, `setns`; agent has zero Linux capabilities; [NVIDIA/OpenShell#113](https://github.com/NVIDIA/OpenShell/issues/113)) | -| **Credentials in request bodies** | Provider placeholder model only intercepts `Authorization` headers; credentials embedded in JSON bodies or query parameters require server-side injection | -| **Response transformation** | Scanning, filtering, or transforming responses before they reach the agent (e.g., repo provisioner's scan-before-copy) | -| **Multi-step atomic operations** | Operations that combine multiple steps as a single unit (clone + scan + copy) where partial completion would be worse than failure | - -### 8. Bind address and network exposure - -**Context.** The experiment found that API servers must bind to `0.0.0.0` -because the L7 proxy connects from inside the container network namespace — -servers bound to `127.0.0.1` are unreachable. On rootless Podman, the container -bridge gateway IP (e.g., `10.88.0.1`) lives inside the container namespace and -cannot be bound from the host (`EADDRNOTAVAIL`). - -**Decision.** Servers default to `--bind-address 127.0.0.1` (secure by -default). The runner explicitly passes `--bind-address 0.0.0.0` when starting -servers for sandboxed agents. This is a security trade-off: on shared hosts, -other processes can probe the server ports. Bearer token authentication -mitigates this but doesn't eliminate the attack surface. - -**Future.** [NVIDIA/OpenShell#1633](https://github.com/NVIDIA/OpenShell/issues/1633) -proposes generalizing the `inference.local` supervisor proxy pattern for -arbitrary host services. If implemented, the supervisor would proxy connections -from inside the sandbox to `127.0.0.1` on the host, eliminating the need for -`0.0.0.0` binding entirely. Servers would never be network-exposed. The runner -should detect OpenShell support for this and stop passing -`--bind-address 0.0.0.0` when available. - -**Network policy.** All URLs delivered into the sandbox must use -`host.openshell.internal`, never raw IPs. The L7 proxy matches requests by -hostname, and SSRF protection blocks private IP addresses. The `allowed_ips` -field in network policies handles SSRF allowlisting separately using the host -IP rendered into policy templates at runtime. - -### 9. Security hardening requirements - -Based on experiment findings and code review of the PoC servers: - -| Requirement | Rationale | -|---|---| -| **Timing-safe token comparison** | Naive string comparison leaks token length via timing side-channel. Use `crypto/subtle.ConstantTimeCompare` (Go) or `hmac.compare_digest` (Python). | -| **Request body size limits** | Prevent DoS via oversized payloads. Recommend 1 MB default (`http.MaxBytesReader` in Go, `Content-Length` check in Python). | -| **Rate limiting on unauthenticated endpoints** | `/healthz`, `/tools.json` are reachable without a bearer token. Rate limit to prevent abuse, especially when bound to `0.0.0.0`. | -| **Credential scrubbing in error messages** | Error responses must not leak credentials embedded in URLs or environment variables. Servers must scrub before returning errors to the agent. | -| **Bounded in-memory state** | Servers that track operation state (e.g., provisioner's job map) must bound the state or expire old entries. Unbounded growth is acceptable only for short-lived experiment runs. | - -### 10. Relationship to existing ADRs - -| ADR | Relationship | -|---|---| -| **0016** (Unidirectional control flow) | Preserved — API servers are provisioned top-down from the harness. Agents cannot request servers to be started. The runner manages the full lifecycle. | -| **0017** (Credential isolation) | Implemented — this ADR specifies the concrete process contract for the host-side REST server model. Per-run bearer tokens via provider placeholders fulfill the "credentials never enter the sandbox" requirement. | -| **0024** (Harness definitions) | Extended — this ADR specifies runtime behavior for the `api_servers` field, adds `providers` sub-field for composable policy profiles, and inserts API server lifecycle into the execution sequence (after pre-script, before sandbox creation). | -| **0025** (Provider-based credential delivery) | Tier 3 (host-side REST server) is now fully specified. The decision framework in §7 defines when to use Tier 3 vs. Tier 2 (providers). | From c828123a5bd69875d63ad581fab947ec5b61eecd Mon Sep 17 00:00:00 2001 From: Marta Anon Date: Fri, 12 Jun 2026 10:16:00 +0200 Subject: [PATCH 3/3] =?UTF-8?q?docs:=20refine=20ADR=200046=20=E2=80=94=20s?= =?UTF-8?q?cope=20attribution,=20Go=20interface,=20process=20contract?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Distinguish what ADR 0017/0025 originally established (request bodies, response transformation) from the broader scope this ADR introduces - Add implementation model Options subsection: language-agnostic process contract vs. Go interface - Clarify the process contract is the runner's enforcement boundary; the Go interface is the recommended internal pattern for fullsend-maintained servers, not a runner requirement - Scope experiment reference — experiment validated the flow, Go interface decision came from review discussion - Update Consequences for Go interface pattern and process contract as enforcement boundary Co-Authored-By: Claude Opus 4.6 Signed-off-by: Marta Anon --- docs/ADRs/0046-host-side-api-server-design.md | 66 +++++++++++++++---- 1 file changed, 52 insertions(+), 14 deletions(-) diff --git a/docs/ADRs/0046-host-side-api-server-design.md b/docs/ADRs/0046-host-side-api-server-design.md index ed6c31468..37c30c112 100644 --- a/docs/ADRs/0046-host-side-api-server-design.md +++ b/docs/ADRs/0046-host-side-api-server-design.md @@ -28,18 +28,20 @@ Accepted [ADR 0024](0024-harness-definitions.md) introduced the `api_servers` harness field as planned but not implemented. [ADR 0017](0017-credential-isolation-for-sandboxed-agents.md)/[ADR 0025](0025-provider-credential-delivery-for-sandboxed-agents.md) established the host-side REST server as Tier 3 of the credential delivery model — for cases where providers (Tier 2) cannot -handle: long-running operations exceeding MCP timeouts, operations the sandbox +handle, originally scoped to credentials in request bodies and response +transformation. Practice revealed additional cases beyond provider reach: +long-running operations exceeding MCP timeouts, operations the sandbox deliberately blocks (container builds, see [NVIDIA/OpenShell#113](https://github.com/NVIDIA/OpenShell/issues/113)), -credentials in request bodies, response transformation, and multi-step atomic -operations. +and multi-step atomic operations. The `host-side-api-server` experiment ([fullsend-ai/experiments#28](https://github.com/fullsend-ai/experiments/pull/28)) validated the end-to-end flow with two servers (Go container builder, Python repo provisioner), testing lifecycle management, API discoverability, L7 policy -tuning, per-run auth, and file transfer. This ADR records the design decisions -informed by that experiment. +tuning, per-run auth, and file transfer. This ADR records design decisions +informed by that experiment and by subsequent review discussion on the +implementation model. ## Options @@ -76,22 +78,53 @@ OpenShell mount support that is not yet universally available **HTTP multipart via the API:** fully portable, but large files through the L7 proxy add overhead. +### Implementation model + +**Language-agnostic process contract:** each server is an independent binary +in any language. The runner manages it via CLI flags (`--port`, `--token`, +`--bind-address`) and expects `/healthz`, `/tools.json`, bearer auth, and +clean `SIGTERM` shutdown. The experiment validated this with Go and Python +servers. Maximally flexible, but every server re-implements boilerplate (CLI +parsing, health endpoints, auth middleware, security hardening). + +**Go interface:** fullsend-maintained servers implement an internal Go +interface, compiled to a single binary. Simplifies deployment (one binary, +no runtime dependencies) and enables stub-based testing. The process +contract remains the runner's enforcement boundary — the interface is an +internal pattern, not a runner requirement. Provides a seam for adapting +deployment topology across workflow platforms (sidecars, pods). + ## Decision -Adopt the host-side API server design with the following process contract, -policy model, and security requirements. Full design details in the -[`host-side-api-server` experiment](https://github.com/fullsend-ai/experiments/pull/28). +Adopt the host-side API server design with the following implementation model, +policy model, and security requirements. The +[`host-side-api-server` experiment](https://github.com/fullsend-ai/experiments/pull/28) +validated the end-to-end flow; the implementation model was refined during +review. -**Process contract.** Every host-side API server must accept `--port`, -`--token`, and `--bind-address` CLI flags; serve `GET /healthz` +**Process contract.** Every compiled API server binary accepts `--port`, +`--token`, and `--bind-address` flags; serves `GET /healthz` (unauthenticated) and `GET /tools.json` (structured tool-use schema) for agent -discovery; validate bearer tokens on all other endpoints; and shut down cleanly -on `SIGTERM`. Servers must write logs to stderr; the runner collects and +discovery; validates bearer tokens on all other endpoints; and shuts down +cleanly on `SIGTERM`. Servers write logs to stderr; the runner collects and bundles logs from all API servers so they are available for inspection after the run completes. The runner starts servers after pre-script, health-checks before sandbox creation, and tears down after sandbox destruction. If a server crashes mid-run, the run fails. +**Implementation model.** The runner enforces the process contract above — any +binary that satisfies it is a valid API server. Internally, fullsend-maintained +servers use a Go interface as the recommended pattern: each server provides +HTTP handlers and a `/tools.json` schema, and the `main()` wires the +implementation to the process contract (CLI flags, health endpoints, auth +middleware, signal handling). Harnesses reference a compiled Go binary, the +workflow runner downloads it to the host, and `fullsend run` manages its +lifecycle. The interface makes servers testable via stub implementations and +provides the seam for adapting deployment topology across workflow platforms +(e.g., sidecars or separate pods on Kubernetes/OpenShift). Implementors who +need more control can write their own `main()` against the process contract +directly. + **Network policy via composable provider profiles.** Each API server ships atomic capability profiles — one per logical group of endpoints (e.g., `builder-build`, `builder-push`, `builder-read`). Harnesses list which profiles @@ -122,8 +155,9 @@ messages, bounded in-memory state. ## Consequences - The `api_servers` harness field ([ADR 0024](0024-harness-definitions.md)) will gain a `providers` sub-field and - defined runtime behavior — servers can be implemented in any language - following the uniform process contract. + defined runtime behavior. The initial implementation targets Go servers + behind an internal interface; the process contract keeps the door open for + other languages. - Implementing this design requires [#776](https://github.com/fullsend-ai/fullsend/issues/776) (provider-backed policy composition) as a prerequisite. @@ -135,3 +169,7 @@ messages, bounded in-memory state. - API servers (Tier 3) are now clearly scoped to cases providers cannot handle: long-running operations, sandbox capability gaps, credentials in request bodies, response transformation, and multi-step atomic operations. +- Fullsend-maintained servers follow a Go interface pattern (testable, + platform-portable). The process contract remains the enforcement boundary + — servers in other languages or with custom `main()` implementations are + valid as long as they satisfy it.