Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 175 additions & 0 deletions docs/ADRs/0046-host-side-api-server-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
---
title: "46. Host-side API server design for sandboxed agents"
status: Accepted
relates_to:
- agent-infrastructure
- security-threat-model
topics:
- sandbox
- api-server
- credential-isolation
---

# 46. Host-side API server design for sandboxed agents

Date: 2026-06-02

## Status

Accepted

<!-- Once this ADR is Accepted, its content is frozen. Do not edit the Context,
Decision, or Consequences sections. If circumstances change, write a new
ADR that supersedes this one. Only status changes and links to superseding
ADRs should be added after acceptance. -->

## Context

[ADR 0024](0024-harness-definitions.md) introduced the `api_servers` harness field as planned but not
implemented. [ADR 0017](0017-credential-isolation-for-sandboxed-agents.md)/[ADR 0025](0025-provider-credential-delivery-for-sandboxed-agents.md) established the host-side REST server as Tier 3
of the credential delivery model — for cases where providers (Tier 2) cannot
Comment thread
maruiz93 marked this conversation as resolved.
handle, originally scoped to credentials in request bodies and response
transformation. Practice revealed additional cases beyond provider reach:
long-running operations exceeding MCP timeouts, operations the sandbox
deliberately blocks (container builds, see
[NVIDIA/OpenShell#113](https://github.com/NVIDIA/OpenShell/issues/113)),
and multi-step atomic operations.

The `host-side-api-server` experiment
([fullsend-ai/experiments#28](https://github.com/fullsend-ai/experiments/pull/28))
validated the end-to-end flow with two servers (Go container builder, Python
repo provisioner), testing lifecycle management, API discoverability, L7 policy
tuning, per-run auth, and file transfer. This ADR records design decisions
informed by that experiment and by subsequent review discussion on the
implementation model.

## Options

### API discoverability

Three approaches were tested. `/tools.json` (structured tool-use schema) was
the most token-efficient under full access (92k tokens vs 107k for OpenAPI,
100k for baked instructions). Both discovery-based methods fail under
restricted policies where the endpoint is blocked; baked instructions succeed
(84k) but can drift from the actual API. OpenAPI's verbose structure adds
context tokens without proportional benefit for LLM agents.

### Per-run authentication

**UUID bearer token via provider placeholders:** simple, proven, no key
management. The proxy resolves the placeholder — the real token never enters
the sandbox. No claims or expiry.

**Short-lived JWTs with claims:** per-operation authorization and audit trail,
but adds signing key management and JWT library dependencies in every server.
The L7 policy already restricts reachable endpoints, making JWT claims a
redundant second layer.

### File transfer between server and sandbox

**`openshell sandbox upload/download` from the server:** works today, validated
by experiment, handles real-time exchange during request handling. Couples
server to OpenShell CLI.

**Shared host mount:** transparent POSIX access, no CLI coupling. Depends on
OpenShell mount support that is not yet universally available
([NVIDIA/OpenShell#1509](https://github.com/NVIDIA/OpenShell/issues/1509)).

**HTTP multipart via the API:** fully portable, but large files through the L7
proxy add overhead.

### Implementation model

**Language-agnostic process contract:** each server is an independent binary
in any language. The runner manages it via CLI flags (`--port`, `--token`,
`--bind-address`) and expects `/healthz`, `/tools.json`, bearer auth, and
clean `SIGTERM` shutdown. The experiment validated this with Go and Python
servers. Maximally flexible, but every server re-implements boilerplate (CLI
parsing, health endpoints, auth middleware, security hardening).

**Go interface:** fullsend-maintained servers implement an internal Go
interface, compiled to a single binary. Simplifies deployment (one binary,
no runtime dependencies) and enables stub-based testing. The process
contract remains the runner's enforcement boundary — the interface is an
internal pattern, not a runner requirement. Provides a seam for adapting
deployment topology across workflow platforms (sidecars, pods).

## Decision

Adopt the host-side API server design with the following implementation model,
policy model, and security requirements. The
[`host-side-api-server` experiment](https://github.com/fullsend-ai/experiments/pull/28)
validated the end-to-end flow; the implementation model was refined during
review.

Comment thread
ifireball marked this conversation as resolved.
**Process contract.** Every compiled API server binary accepts `--port`,
`--token`, and `--bind-address` flags; serves `GET /healthz`
(unauthenticated) and `GET /tools.json` (structured tool-use schema) for agent
discovery; validates bearer tokens on all other endpoints; and shuts down
cleanly on `SIGTERM`. Servers write logs to stderr; the runner collects and
bundles logs from all API servers so they are available for inspection after
Comment thread
maruiz93 marked this conversation as resolved.
the run completes. The runner starts servers after pre-script, health-checks
before sandbox creation, and tears down after sandbox destruction. If a server
crashes mid-run, the run fails.

**Implementation model.** The runner enforces the process contract above — any
binary that satisfies it is a valid API server. Internally, fullsend-maintained
servers use a Go interface as the recommended pattern: each server provides
HTTP handlers and a `/tools.json` schema, and the `main()` wires the
implementation to the process contract (CLI flags, health endpoints, auth
middleware, signal handling). Harnesses reference a compiled Go binary, the
workflow runner downloads it to the host, and `fullsend run` manages its
lifecycle. The interface makes servers testable via stub implementations and
provides the seam for adapting deployment topology across workflow platforms
(e.g., sidecars or separate pods on Kubernetes/OpenShift). Implementors who
need more control can write their own `main()` against the process contract
directly.

**Network policy via composable provider profiles.** Each API server ships
atomic capability profiles — one per logical group of endpoints (e.g.,
`builder-build`, `builder-push`, `builder-read`). Harnesses list which profiles
to attach. Composition is additive per OpenShell's provider-backed policy
composition
([NVIDIA/OpenShell#1037](https://github.com/NVIDIA/OpenShell/pull/1037)).
Different agent roles compose different capability sets for the same server.
Requires OpenShell >= v0.0.37 and
[#776](https://github.com/fullsend-ai/fullsend/issues/776).

**Per-run auth:** UUID bearer token via OpenShell provider placeholders. JWTs
are a future enhancement when per-operation claims become necessary.

**File transfer:** `openshell sandbox upload/download` from the server during
Comment thread
maruiz93 marked this conversation as resolved.
Comment thread
maruiz93 marked this conversation as resolved.
request handling. Shared host mount
([NVIDIA/OpenShell#1509](https://github.com/NVIDIA/OpenShell/issues/1509))
will be evaluated as an alternative when available.

**Bind address:** servers default to `127.0.0.1`, runner overrides to
`0.0.0.0` for sandboxed agents.
[NVIDIA/OpenShell#1633](https://github.com/NVIDIA/OpenShell/issues/1633)
(supervisor-proxied host-local endpoints) would eliminate this requirement.

**Security hardening:** timing-safe token comparison, 1 MB request body limits,
rate limiting on unauthenticated endpoints, credential scrubbing in error
messages, bounded in-memory state.

## Consequences

- The `api_servers` harness field ([ADR 0024](0024-harness-definitions.md)) will gain a `providers` sub-field and
defined runtime behavior. The initial implementation targets Go servers
behind an internal interface; the process contract keeps the door open for
other languages.
- Implementing this design requires
[#776](https://github.com/fullsend-ai/fullsend/issues/776) (provider-backed
policy composition) as a prerequisite.
- Servers are coupled to the OpenShell CLI for file transfer until shared host
mounts are universally available.
- Servers must bind to `0.0.0.0` on shared hosts, widening the attack surface
until [NVIDIA/OpenShell#1633](https://github.com/NVIDIA/OpenShell/issues/1633)
ships.
- API servers (Tier 3) are now clearly scoped to cases providers cannot
handle: long-running operations, sandbox capability gaps, credentials in
request bodies, response transformation, and multi-step atomic operations.
- Fullsend-maintained servers follow a Go interface pattern (testable,
platform-portable). The process contract remains the enforcement boundary
— servers in other languages or with custom `main()` implementations are
valid as long as they satisfy it.
3 changes: 2 additions & 1 deletion docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,8 @@ Identity is not the same as trust. An agent's identity lets it authenticate to e

**Decided:**

- Credential delivery model: four tiers — (1) prefetch + post-process for agents with enumerable inputs (zero credential access), (2) OpenShell providers + L7 egress policies for static token auth (credentials never enter sandbox), (3) host-side REST server for request-body credential injection or response transformation, (4) host files + L7 policies for complex auth requiring in-sandbox credential files. L7 policies enforce both method + path and binary-level restrictions. Providers are preferred over REST servers when viable ([ADR 0017](ADRs/0017-credential-isolation-for-sandboxed-agents.md), extended by [ADR 0025](ADRs/0025-provider-credential-delivery-for-sandboxed-agents.md)).
- Credential delivery model: four tiers — (1) prefetch + post-process for agents with enumerable inputs (zero credential access), (2) OpenShell providers + L7 egress policies for static token auth (credentials never enter sandbox), (3) host-side REST server for operations providers cannot handle — long-running operations, sandbox capability gaps, credentials in request bodies, response transformation, and multi-step atomic operations (see [ADR 0046](ADRs/0046-host-side-api-server-design.md)), (4) host files + L7 policies for complex auth requiring in-sandbox credential files. L7 policies enforce both method + path and binary-level restrictions. Providers are preferred over REST servers when viable ([ADR 0017](ADRs/0017-credential-isolation-for-sandboxed-agents.md), extended by [ADR 0025](ADRs/0025-provider-credential-delivery-for-sandboxed-agents.md)).
- Host-side API server design: Tier 3 servers follow a uniform process contract (`--port`, `--token`, `--bind-address`, `/healthz`, `/tools.json`, `SIGTERM`). Network access is controlled via composable provider profiles — atomic capability profiles composed per-harness. Per-run UUID bearer tokens are delivered through OpenShell provider placeholders. File transfer uses `openshell sandbox upload/download` ([ADR 0046](ADRs/0046-host-side-api-server-design.md)).
Comment thread
maruiz93 marked this conversation as resolved.
- Per-role GitHub Apps with manifest-based creation. Each agent role gets its own app with scoped permissions. PEMs stored in Secret Manager as `fullsend-{role}-app-pem` — one secret per role, shared across orgs on a mint. Org isolation is enforced via `ALLOWED_ORGS`, `ROLE_APP_IDS`, and installation verification ([ADR 0007](ADRs/0007-per-role-github-apps.md), [ADR 0033](ADRs/0033-per-repo-installation-mode.md)).

One concrete implementation option is [`oidcx`](https://github.com/oxidecomputer/oidcx): a service that accepts OIDC identity tokens and exchanges them for short-lived access tokens. It can mint tokens scoped to selected GitHub repositories and permissions, or to selected Oxide silos and permissions, and it also ships with a GitHub Action wrapper. In a Fullsend deployment, this can be used by the sandbox entrypoint to narrow a broad GitHub App identity down to only the specific permissions an agent needs for the current run.
Expand Down
Loading