From fc5ae355773f63587bbf74658c4a1a95407a1609 Mon Sep 17 00:00:00 2001
From: Marta Anon <manon@redhat.com>
Date: Tue, 2 Jun 2026 22:02:07 +0200
Subject: [PATCH 1/3] docs: add design spec for host-side API server ADR

Covers server process contract, API discoverability, composable
provider profiles, per-run authentication, credential delivery,
file transfer, provider vs API server decision framework, bind
address security, and hardening requirements.

Signed-off-by: Marta Anon <manon@redhat.com>
---
 .../2026-06-02-host-side-api-server-design.md | 434 ++++++++++++++++++
 1 file changed, 434 insertions(+)
 create mode 100644 docs/superpowers/specs/2026-06-02-host-side-api-server-design.md
diff --git a/docs/superpowers/specs/2026-06-02-host-side-api-server-design.md b/docs/superpowers/specs/2026-06-02-host-side-api-server-design.md
new file mode 100644
index 000000000..216910622
--- /dev/null
+++ b/docs/superpowers/specs/2026-06-02-host-side-api-server-design.md
@@ -0,0 +1,434 @@
+# ADR 0043 — Host-Side API Server Design — Spec
+
+Tracking issue: [#880](https://github.com/fullsend-ai/fullsend/issues/880)
+Parent issue: [#879](https://github.com/fullsend-ai/fullsend/issues/879)
+Experiment: `experiments/host-side-api-server` ([fullsend-ai/experiments#28](https://github.com/fullsend-ai/experiments/pull/28))
+
+## Purpose
+
+Define the complete design for host-side API servers that run outside the
+OpenShell sandbox and are callable by the agent via HTTP. ADR 0024 introduced
+the `api_servers` harness field as PLANNED, and ADRs 0017/0025 established the
+host-side REST server as Tier 3 of the credential delivery model. This ADR
+fills the remaining design gaps before implementation (#881).
+
+## Decisions to record
+
+### 1. Server process contract and lifecycle
+
+**Context.** The experiment validated a uniform process contract across two
+servers in different languages (Go, Python). The runner needs a
+language-agnostic interface to manage arbitrary API servers.
+
+**Decision.** Every host-side API server managed by the runner must:
+
+- Accept `--port <port>` and `--token <bearer-token>` CLI flags
+- Accept `--bind-address <addr>` (default `127.0.0.1`, see §8 for why the
+  runner overrides to `0.0.0.0` today)
+- Serve `GET /healthz` returning `{"status": "ok"}` when ready (unauthenticated)
+- Serve `GET /tools.json` for agent discovery (see §2)
+- Validate `Authorization: Bearer <token>` on all non-health, non-discovery
+  endpoints
+- Shut down cleanly on `SIGTERM` (5s grace period, then `SIGKILL`)
+
+**Runner lifecycle:**
+
+1. Start declared API servers after pre-script, before sandbox creation
+2. Poll `GET /healthz` until 200 (timeout: 15s, 500ms interval)
+3. Configure sandbox network policy and provider credentials
+4. Create sandbox and start agent
+5. On exit (success or failure): send `SIGTERM` to servers, wait grace period,
+   `SIGKILL` if needed — after sandbox destruction (step 11 of ADR 0024)
+
+**Crash behavior.** If an API server crashes mid-run, the run fails. No restart
+logic. API servers are critical infrastructure — a crashed server means the
+agent has lost access to capabilities it depends on, and continuing would
+produce incomplete or incorrect results.
+
+**Harness schema.** Uses the existing `api_servers` field from ADR 0024,
+keeping the `script` field name:
+
+```yaml
+api_servers:
+  - name: builder
+    script: api-servers/builder/bin/builder-server
+    port: 9090
+    providers:                       # NEW — see §3
+      - builder-build
+      - builder-push
+    env:
+      REGISTRY_TOKEN: ${REGISTRY_TOKEN}
+```
+
+### 2. API discoverability for the agent
+
+**Context.** The experiment compared three approaches for making the API known
+to the agent. Each was tested under full-access and restricted L7 policies.
+
+**Options:**
+
+| | `/tools.json` | `/openapi.json` | Baked instructions |
+|---|---|---|---|
+| Token efficiency (full access) | 92k (best) | 107k | 100k |
+| Token efficiency (restricted) | 205k (fails) | 534k (fails) | 84k (succeeds) |
+| Resilience to blocked discovery | Fails — agent guesses paths | Fails — agent guesses paths | Succeeds — knows paths from skill |
+| Maintainability | Single source of truth in server | Single source of truth in server | Skill can drift from API |
+| Agent parsing | Structured JSON, minimal ambiguity | Verbose nested structure, more context tokens | Prose, more interpretation needed |
+
+**Decision.** Require `GET /tools.json` as the standard discovery endpoint.
+Each entry contains `name`, `description`, `endpoint`, `method`, and
+`input_schema`:
+
+```json
+[
+  {
+    "name": "build_container",
+    "description": "Build a container image using podman or docker",
+    "endpoint": "/build",
+    "method": "POST",
+    "input_schema": {
+      "type": "object",
+      "required": ["tag"],
+      "properties": {
+        "tag": {"type": "string", "description": "Image tag"},
+        "dockerfile": {"type": "string", "default": "Dockerfile"}
+      }
+    }
+  }
+]
+```
+
+**Rationale.**
+
+- `/tools.json` is the most token-efficient under full access (92k vs 100k for
+  baked, 107k for OpenAPI).
+- It returns structured data purpose-built for agent consumption — agents parse
+  JSON directly rather than interpreting Markdown prose or navigating verbose
+  OpenAPI nesting.
+- The schema is a single source of truth in the server. When the API changes,
+  the agent automatically discovers the new schema.
+- OpenAPI is designed for code generators and documentation tools — its nested
+  structure adds context tokens without proportional benefit for LLM agents.
+- Baked instructions are the most resilient to restricted policies (84k, only
+  method that succeeds), but the discovery endpoint should not be blocked in
+  normal operation — it is part of the required process contract.
+
+### 3. Network policy via composable provider profiles
+
+**Context.** The experiment manually authored L7 policies and hit bugs from
+mismatches between server capabilities and policy rules. OpenShell now supports
+provider-backed policy composition
+([NVIDIA/OpenShell#947](https://github.com/NVIDIA/OpenShell/issues/947),
+[NVIDIA/OpenShell#1037](https://github.com/NVIDIA/OpenShell/pull/1037)) where
+attaching a provider to a sandbox auto-injects L7 rules as Layer 2 policy
+entries. Fullsend issue
+[#776](https://github.com/fullsend-ai/fullsend/issues/776) tracks adopting
+this for harness policies.
+
+**How OpenShell policy composition works.** OpenShell has a 3-layer policy
+stack:
+
+- **Layer 1 (Base):** Filesystem, process, landlock — static sandbox config.
+- **Layer 2 (Provider):** Auto-generated from attached providers. Each attached
+  provider contributes network policy rules under a reserved `_provider_*` key.
+- **Layer 3 (User):** Explicit user-authored rules via `openshell policy set`.
+
+A **provider** bundles three things: credentials (with injection style),
+endpoints (L7 rules), and binaries (which executables can use those endpoints).
+A **provider profile** is the template that defines a provider type — the YAML
+schema declaring the endpoints, binaries, and auth configuration. When a
+provider is attached to a sandbox, OpenShell auto-injects its endpoint rules
+into the effective policy.
+
+Composition is **additive**: the proxy permits a request if it matches ANY rule
+across layers (union of allows). Deny rules win globally — if any rule denies
+a request, it's blocked regardless of allows in other rules.
+
+**Options:**
+
+**Option A: Composable provider profiles per capability (recommended).** Each
+API server ships atomic provider profiles — one per logical group of endpoints.
+Each profile bundles the credential injection (`auth` block), the L7 endpoint
+rules, and the binary restrictions as a single `(credential, endpoint, binary)`
+unit. Harnesses list which profiles to attach. Different agent roles compose
+different capability sets for the same server.
+
+```yaml
+# Provider profiles (defined once per API server, registered on gateway):
+#
+# builder-build profile:
+#   endpoints: POST /build, GET /healthz, GET /tools.json
+#   binaries: **/curl
+#   auth: bearer token injection
+#
+# builder-push profile:
+#   endpoints: POST /push
+#   binaries: **/curl
+#   auth: bearer token injection
+#
+# builder-read profile:
+#   endpoints: GET /images
+#   binaries: **/curl
+#   auth: bearer token injection
+
+# Code agent harness — full access:
+api_servers:
+  - name: builder
+    script: api-servers/builder/bin/builder-server
+    port: 9090
+    providers:
+      - builder-build
+      - builder-push
+      - builder-read
+
+# Review agent harness — read-only:
+api_servers:
+  - name: builder
+    script: api-servers/builder/bin/builder-server
+    port: 9090
+    providers:
+      - builder-read
+```
+
+Pros: reusable across harnesses, follows least privilege naturally (compose
+only what the agent needs), aligns with OpenShell's provider model (credential
++ endpoint + binary as a unit), each profile is defined once and composed
+freely. Cons: requires creating and registering custom provider profiles for
+each API server capability, depends on OpenShell >= v0.0.37 and fullsend #776.
+
+**Option B: Runner-generated monolithic policy.** The runner generates a single
+L7 policy file from an `allowed_paths` list in the harness `api_servers` entry.
+No dependency on OpenShell provider profiles.
+
+```yaml
+api_servers:
+  - name: builder
+    script: api-servers/builder/bin/builder-server
+    port: 9090
+    allowed_paths:
+      - method: POST
+        path: /build
+      - method: GET
+        path: /images
+```
+
+Pros: works with any OpenShell version, no external dependencies, simpler to
+implement. Cons: duplicates policy logic across harnesses, error-prone
+(experiment hit bugs from mismatches between server API surface and manually
+authored policy), no reuse of capability definitions, credential injection must
+be handled separately.
+
+**Decision.** Option A — composable provider profiles per capability.
+
+**Requirements:**
+
+- OpenShell >= v0.0.37 (profile-backed policy composition,
+  [NVIDIA/OpenShell#1037](https://github.com/NVIDIA/OpenShell/pull/1037))
+- The `use_providers_v2` gateway setting may be required (see
+  [#776](https://github.com/fullsend-ai/fullsend/issues/776))
+- Prerequisite: [fullsend-ai/fullsend#776](https://github.com/fullsend-ai/fullsend/issues/776)
+  (adopt provider-backed policy composition)
+
+**Composition semantics.** Composition is additive-only: provider rules and
+user rules live in separate keys, and the proxy permits a request if it matches
+any rule. There is no cross-rule deny mechanism that would let a user policy
+narrow what a provider profile grants (though provider deny rules do block
+globally). Different access levels for the same server are achieved by
+composing different profile sets, not by adding deny overrides.
+
+### 4. Per-run authentication
+
+**Context.** The agent inside the sandbox needs to authenticate to API servers.
+The real credential must never enter the sandbox.
+
+**Options:**
+
+**Option A: UUID bearer token via provider placeholders (recommended).** The
+runner generates a random UUID token per run. The token is registered as an
+OpenShell provider credential with an `auth: bearer` declaration. The L7 proxy
+resolves the placeholder to the real token in outgoing `Authorization` headers
+— the real token never enters the sandbox.
+
+```yaml
+# Provider definition
+name: api-server
+type: generic
+credentials:
+  API_TOKEN: ${API_TOKEN}
+```
+
+Pros: simple, proven by experiment, no key management, credential never enters
+sandbox, credential scoping ensures the token is only injected for requests
+matching the provider's endpoints and binaries. Cons: no claims or expiry —
+the token is valid for the entire run and grants whatever endpoints the L7
+policy allows.
+
+**Option B: Short-lived JWTs with claims.** The runner generates a JWT signed
+with a per-run key pair. Claims include run ID, repo, and allowed operations.
+Servers validate the signature and claims. The JWT can be short-lived (e.g., 1
+hour) with refresh.
+
+Pros: per-operation authorization, expiry, audit trail via claims. Cons: adds
+signing key management, JWT library dependency in every server, more complex
+token lifecycle, and the L7 policy already restricts which endpoints are
+reachable — JWT claims would be a second layer of the same restriction.
+
+**Decision.** Option A — UUID bearer token via provider placeholders. The L7
+policy already enforces which endpoints each agent can reach (§3), making
+per-operation JWT claims redundant for the initial design. JWT-based auth is a
+future enhancement for when per-operation claims become necessary (e.g.,
+multi-tenant servers, cross-run audit).
+
+**Security requirement.** Token comparison must be timing-safe
+(`crypto/subtle.ConstantTimeCompare` in Go, `hmac.compare_digest` in Python).
+The experiment code flagged this as a TODO.
+
+### 5. Credential delivery to the server process
+
+**Context.** API servers hold credentials on behalf of the agent (registry
+tokens, GitHub tokens, API keys). These must reach the server without passing
+through the sandbox.
+
+**Decision.** Credentials are delivered via environment variables expanded from
+the host environment at server startup. The `env` field in `api_servers` (ADR
+0024) supports `${HOST_VAR}` syntax:
+
+```yaml
+api_servers:
+  - name: builder
+    script: api-servers/builder/bin/builder-server
+    port: 9090
+    env:
+      REGISTRY_TOKEN: ${REGISTRY_TOKEN}
+      GCP_KEY_PATH: ${GOOGLE_APPLICATION_CREDENTIALS}
+```
+
+The per-run bearer token is passed via `--token` CLI flag (not through `env`,
+since it's part of the process contract).
+
+No secrets mounts or vault integration in the initial design. The runner
+expands `${VAR}` references against its own environment and passes the resolved
+values to the server process. Sensitive values must not appear in logs or error
+messages — servers must scrub credentials from error output (the experiment's
+provisioner implements `_scrub_credentials` for this).
+
+### 6. File transfer between server and sandbox
+
+**Context.** API servers that build artifacts or provision repos need to
+exchange files with the sandbox. File transfer must happen during request
+handling — the agent calls the API, the server produces or consumes files, and
+the result must be in the sandbox before the response returns. The runner is
+not in this loop.
+
+**Options:**
+
+**Option A: `openshell sandbox upload/download` from the server (recommended).**
+The server shells out to the OpenShell CLI to transfer files during request
+handling. The agent passes its sandbox name per-request (discovered via
+`hostname | sed 's/^sandbox-//'`), and the server uses it with `openshell
+sandbox download <name> <sandbox-path> <local-path>` and `openshell sandbox
+upload <name> <local-path> <sandbox-path>`.
+
+Pros: works today, validated by experiment, handles real-time exchange
+naturally (transfer happens during request handling), no runner mediation
+needed. Cons: couples server to OpenShell CLI — servers need `openshell` on
+`PATH` and can't be tested without it.
+
+**Option B: Shared host mount.** The runner creates a staging directory on the
+host and mounts it into the sandbox via `openshell sandbox create --mount
+<host-path>:<sandbox-path>`. Both the server and the agent see the same
+directory — no explicit transfer needed.
+
+Pros: no transfer commands, transparent POSIX access, no OpenShell CLI
+dependency in the server. Cons: depends on OpenShell mount support (available
+on K3s via
+[NVIDIA/OpenShell#500](https://github.com/NVIDIA/OpenShell/issues/500), pending
+for VM driver via
+[NVIDIA/OpenShell#1509](https://github.com/NVIDIA/OpenShell/issues/1509)),
+bidirectional mounts may introduce TOCTOU risks if both sides write
+concurrently.
+
+**Option C: HTTP multipart via the API.** The agent uploads files to the server
+and downloads results through the server's own REST endpoints using multipart
+form data over the L7 proxy. The server stores files on the host side.
+
+Pros: fully portable, standard HTTP, server is self-contained, no OpenShell
+dependency. Cons: large files go through the L7 proxy (bandwidth/latency
+overhead), requires multipart handling in every server, the proxy must allow
+the content-type and body size.
+
+**Decision.** Option A — `openshell sandbox upload/download` from the server.
+These servers are purpose-built for the OpenShell environment, so the CLI
+coupling is acceptable. The experiment validated this end-to-end with both
+the builder (download context, build, upload tarball) and the provisioner
+(clone, scan, upload repo).
+
+Option B (shared mount) is noted as the preferred future direction when
+OpenShell mount support is universally available — it eliminates transfer
+commands entirely and makes file exchange transparent.
+
+### 7. Provider vs. API server decision framework
+
+**Context.** Issue #196 evaluated providers as a replacement for REST servers.
+Providers (Tier 2) became the preferred path, but API servers (Tier 3) remain
+necessary for cases providers cannot handle.
+
+**Decision.** Use providers (Tier 2) by default. Use API servers (Tier 3) when
+any of the following apply:
+
+| Condition | Why providers can't handle it |
+|---|---|
+| **Long-running operations** (> 60s) | MCP client timeouts (~30-60s) make provider-based tools unsuitable ([claude-code#7575](https://github.com/anthropics/claude-code/issues/7575)) |
+| **Sandbox capability gaps** | Operations the sandbox deliberately blocks (e.g., container builds — seccomp blocks `CLONE_NEWUSER`, `AF_NETLINK`, `setns`; agent has zero Linux capabilities; [NVIDIA/OpenShell#113](https://github.com/NVIDIA/OpenShell/issues/113)) |
+| **Credentials in request bodies** | Provider placeholder model only intercepts `Authorization` headers; credentials embedded in JSON bodies or query parameters require server-side injection |
+| **Response transformation** | Scanning, filtering, or transforming responses before they reach the agent (e.g., repo provisioner's scan-before-copy) |
+| **Multi-step atomic operations** | Operations that combine multiple steps as a single unit (clone + scan + copy) where partial completion would be worse than failure |
+
+### 8. Bind address and network exposure
+
+**Context.** The experiment found that API servers must bind to `0.0.0.0`
+because the L7 proxy connects from inside the container network namespace —
+servers bound to `127.0.0.1` are unreachable. On rootless Podman, the container
+bridge gateway IP (e.g., `10.88.0.1`) lives inside the container namespace and
+cannot be bound from the host (`EADDRNOTAVAIL`).
+
+**Decision.** Servers default to `--bind-address 127.0.0.1` (secure by
+default). The runner explicitly passes `--bind-address 0.0.0.0` when starting
+servers for sandboxed agents. This is a security trade-off: on shared hosts,
+other processes can probe the server ports. Bearer token authentication
+mitigates this but doesn't eliminate the attack surface.
+
+**Future.** [NVIDIA/OpenShell#1633](https://github.com/NVIDIA/OpenShell/issues/1633)
+proposes generalizing the `inference.local` supervisor proxy pattern for
+arbitrary host services. If implemented, the supervisor would proxy connections
+from inside the sandbox to `127.0.0.1` on the host, eliminating the need for
+`0.0.0.0` binding entirely. Servers would never be network-exposed. The runner
+should detect OpenShell support for this and stop passing
+`--bind-address 0.0.0.0` when available.
+
+**Network policy.** All URLs delivered into the sandbox must use
+`host.openshell.internal`, never raw IPs. The L7 proxy matches requests by
+hostname, and SSRF protection blocks private IP addresses. The `allowed_ips`
+field in network policies handles SSRF allowlisting separately using the host
+IP rendered into policy templates at runtime.
+
+### 9. Security hardening requirements
+
+Based on experiment findings and code review of the PoC servers:
+
+| Requirement | Rationale |
+|---|---|
+| **Timing-safe token comparison** | Naive string comparison leaks token length via timing side-channel. Use `crypto/subtle.ConstantTimeCompare` (Go) or `hmac.compare_digest` (Python). |
+| **Request body size limits** | Prevent DoS via oversized payloads. Recommend 1 MB default (`http.MaxBytesReader` in Go, `Content-Length` check in Python). |
+| **Rate limiting on unauthenticated endpoints** | `/healthz`, `/tools.json` are reachable without a bearer token. Rate limit to prevent abuse, especially when bound to `0.0.0.0`. |
+| **Credential scrubbing in error messages** | Error responses must not leak credentials embedded in URLs or environment variables. Servers must scrub before returning errors to the agent. |
+| **Bounded in-memory state** | Servers that track operation state (e.g., provisioner's job map) must bound the state or expire old entries. Unbounded growth is acceptable only for short-lived experiment runs. |
+
+### 10. Relationship to existing ADRs
+
+| ADR | Relationship |
+|---|---|
+| **0016** (Unidirectional control flow) | Preserved — API servers are provisioned top-down from the harness. Agents cannot request servers to be started. The runner manages the full lifecycle. |
+| **0017** (Credential isolation) | Implemented — this ADR specifies the concrete process contract for the host-side REST server model. Per-run bearer tokens via provider placeholders fulfill the "credentials never enter the sandbox" requirement. |
+| **0024** (Harness definitions) | Extended — this ADR specifies runtime behavior for the `api_servers` field, adds `providers` sub-field for composable policy profiles, and inserts API server lifecycle into the execution sequence (after pre-script, before sandbox creation). |
+| **0025** (Provider-based credential delivery) | Tier 3 (host-side REST server) is now fully specified. The decision framework in §7 defines when to use Tier 3 vs. Tier 2 (providers). |

From f138206dce6cd321f8afb41b4486bcf6d86ad817 Mon Sep 17 00:00:00 2001
From: Marta Anon <manon@redhat.com>
Date: Tue, 2 Jun 2026 22:42:33 +0200
Subject: [PATCH 2/3] =?UTF-8?q?docs:=20add=20ADR=200046=20=E2=80=94=20host?=
 =?UTF-8?q?-side=20API=20server=20design=20for=20sandboxed=20agents?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Defines the process contract, composable provider profiles for
network policy, per-run auth via UUID bearer tokens, file transfer
via openshell CLI, and security hardening requirements. Updates
architecture.md with the Tier 3 design summary.

Signed-off-by: Marta Anon <manon@redhat.com>
---
 docs/ADRs/0046-host-side-api-server-design.md | 137 ++++++
 docs/architecture.md                          |   3 +-
 .../2026-06-02-host-side-api-server-design.md | 434 ------------------
 3 files changed, 139 insertions(+), 435 deletions(-)
 create mode 100644 docs/ADRs/0046-host-side-api-server-design.md
 delete mode 100644 docs/superpowers/specs/2026-06-02-host-side-api-server-design.md

diff --git a/docs/ADRs/0046-host-side-api-server-design.md b/docs/ADRs/0046-host-side-api-server-design.md
new file mode 100644
index 000000000..ed6c31468
--- /dev/null
+++ b/docs/ADRs/0046-host-side-api-server-design.md
@@ -0,0 +1,137 @@
+---
+title: "46. Host-side API server design for sandboxed agents"
+status: Accepted
+relates_to:
+  - agent-infrastructure
+  - security-threat-model
+topics:
+  - sandbox
+  - api-server
+  - credential-isolation
+---
+
+# 46. Host-side API server design for sandboxed agents
+
+Date: 2026-06-02
+
+## Status
+
+Accepted
+
+<!-- Once this ADR is Accepted, its content is frozen. Do not edit the Context,
+     Decision, or Consequences sections. If circumstances change, write a new
+     ADR that supersedes this one. Only status changes and links to superseding
+     ADRs should be added after acceptance. -->
+
+## Context
+
+[ADR 0024](0024-harness-definitions.md) introduced the `api_servers` harness field as planned but not
+implemented. [ADR 0017](0017-credential-isolation-for-sandboxed-agents.md)/[ADR 0025](0025-provider-credential-delivery-for-sandboxed-agents.md) established the host-side REST server as Tier 3
+of the credential delivery model — for cases where providers (Tier 2) cannot
+handle: long-running operations exceeding MCP timeouts, operations the sandbox
+deliberately blocks (container builds, see
+[NVIDIA/OpenShell#113](https://github.com/NVIDIA/OpenShell/issues/113)),
+credentials in request bodies, response transformation, and multi-step atomic
+operations.
+
+The `host-side-api-server` experiment
+([fullsend-ai/experiments#28](https://github.com/fullsend-ai/experiments/pull/28))
+validated the end-to-end flow with two servers (Go container builder, Python
+repo provisioner), testing lifecycle management, API discoverability, L7 policy
+tuning, per-run auth, and file transfer. This ADR records the design decisions
+informed by that experiment.
+
+## Options
+
+### API discoverability
+
+Three approaches were tested. `/tools.json` (structured tool-use schema) was
+the most token-efficient under full access (92k tokens vs 107k for OpenAPI,
+100k for baked instructions). Both discovery-based methods fail under
+restricted policies where the endpoint is blocked; baked instructions succeed
+(84k) but can drift from the actual API. OpenAPI's verbose structure adds
+context tokens without proportional benefit for LLM agents.
+
+### Per-run authentication
+
+**UUID bearer token via provider placeholders:** simple, proven, no key
+management. The proxy resolves the placeholder — the real token never enters
+the sandbox. No claims or expiry.
+
+**Short-lived JWTs with claims:** per-operation authorization and audit trail,
+but adds signing key management and JWT library dependencies in every server.
+The L7 policy already restricts reachable endpoints, making JWT claims a
+redundant second layer.
+
+### File transfer between server and sandbox
+
+**`openshell sandbox upload/download` from the server:** works today, validated
+by experiment, handles real-time exchange during request handling. Couples
+server to OpenShell CLI.
+
+**Shared host mount:** transparent POSIX access, no CLI coupling. Depends on
+OpenShell mount support that is not yet universally available
+([NVIDIA/OpenShell#1509](https://github.com/NVIDIA/OpenShell/issues/1509)).
+
+**HTTP multipart via the API:** fully portable, but large files through the L7
+proxy add overhead.
+
+## Decision
+
+Adopt the host-side API server design with the following process contract,
+policy model, and security requirements. Full design details in the
+[`host-side-api-server` experiment](https://github.com/fullsend-ai/experiments/pull/28).
+
+**Process contract.** Every host-side API server must accept `--port`,
+`--token`, and `--bind-address` CLI flags; serve `GET /healthz`
+(unauthenticated) and `GET /tools.json` (structured tool-use schema) for agent
+discovery; validate bearer tokens on all other endpoints; and shut down cleanly
+on `SIGTERM`. Servers must write logs to stderr; the runner collects and
+bundles logs from all API servers so they are available for inspection after
+the run completes. The runner starts servers after pre-script, health-checks
+before sandbox creation, and tears down after sandbox destruction. If a server
+crashes mid-run, the run fails.
+
+**Network policy via composable provider profiles.** Each API server ships
+atomic capability profiles — one per logical group of endpoints (e.g.,
+`builder-build`, `builder-push`, `builder-read`). Harnesses list which profiles
+to attach. Composition is additive per OpenShell's provider-backed policy
+composition
+([NVIDIA/OpenShell#1037](https://github.com/NVIDIA/OpenShell/pull/1037)).
+Different agent roles compose different capability sets for the same server.
+Requires OpenShell >= v0.0.37 and
+[#776](https://github.com/fullsend-ai/fullsend/issues/776).
+
+**Per-run auth:** UUID bearer token via OpenShell provider placeholders. JWTs
+are a future enhancement when per-operation claims become necessary.
+
+**File transfer:** `openshell sandbox upload/download` from the server during
+request handling. Shared host mount
+([NVIDIA/OpenShell#1509](https://github.com/NVIDIA/OpenShell/issues/1509))
+will be evaluated as an alternative when available.
+
+**Bind address:** servers default to `127.0.0.1`, runner overrides to
+`0.0.0.0` for sandboxed agents.
+[NVIDIA/OpenShell#1633](https://github.com/NVIDIA/OpenShell/issues/1633)
+(supervisor-proxied host-local endpoints) would eliminate this requirement.
+
+**Security hardening:** timing-safe token comparison, 1 MB request body limits,
+rate limiting on unauthenticated endpoints, credential scrubbing in error
+messages, bounded in-memory state.
+
+## Consequences
+
+- The `api_servers` harness field ([ADR 0024](0024-harness-definitions.md)) will gain a `providers` sub-field and
+  defined runtime behavior — servers can be implemented in any language
+  following the uniform process contract.
+- Implementing this design requires
+  [#776](https://github.com/fullsend-ai/fullsend/issues/776) (provider-backed
+  policy composition) as a prerequisite.
+- Servers are coupled to the OpenShell CLI for file transfer until shared host
+  mounts are universally available.
+- Servers must bind to `0.0.0.0` on shared hosts, widening the attack surface
+  until [NVIDIA/OpenShell#1633](https://github.com/NVIDIA/OpenShell/issues/1633)
+  ships.
+- API servers (Tier 3) are now clearly scoped to cases providers cannot
+  handle: long-running operations, sandbox capability gaps, credentials in
+  request bodies, response transformation, and multi-step atomic operations.
diff --git a/docs/architecture.md b/docs/architecture.md
index 71354f75b..7334a01c7 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -117,7 +117,8 @@ Identity is not the same as trust. An agent's identity lets it authenticate to e
 
 **Decided:**
 
-- Credential delivery model: four tiers — (1) prefetch + post-process for agents with enumerable inputs (zero credential access), (2) OpenShell providers + L7 egress policies for static token auth (credentials never enter sandbox), (3) host-side REST server for request-body credential injection or response transformation, (4) host files + L7 policies for complex auth requiring in-sandbox credential files. L7 policies enforce both method + path and binary-level restrictions. Providers are preferred over REST servers when viable ([ADR 0017](ADRs/0017-credential-isolation-for-sandboxed-agents.md), extended by [ADR 0025](ADRs/0025-provider-credential-delivery-for-sandboxed-agents.md)).
+- Credential delivery model: four tiers — (1) prefetch + post-process for agents with enumerable inputs (zero credential access), (2) OpenShell providers + L7 egress policies for static token auth (credentials never enter sandbox), (3) host-side REST server for operations providers cannot handle — long-running operations, sandbox capability gaps, credentials in request bodies, response transformation, and multi-step atomic operations (see [ADR 0046](ADRs/0046-host-side-api-server-design.md)), (4) host files + L7 policies for complex auth requiring in-sandbox credential files. L7 policies enforce both method + path and binary-level restrictions. Providers are preferred over REST servers when viable ([ADR 0017](ADRs/0017-credential-isolation-for-sandboxed-agents.md), extended by [ADR 0025](ADRs/0025-provider-credential-delivery-for-sandboxed-agents.md)).
+- Host-side API server design: Tier 3 servers follow a uniform process contract (`--port`, `--token`, `--bind-address`, `/healthz`, `/tools.json`, `SIGTERM`). Network access is controlled via composable provider profiles — atomic capability profiles composed per-harness. Per-run UUID bearer tokens are delivered through OpenShell provider placeholders. File transfer uses `openshell sandbox upload/download` ([ADR 0046](ADRs/0046-host-side-api-server-design.md)).
 - Per-role GitHub Apps with manifest-based creation. Each agent role gets its own app with scoped permissions. PEMs stored in Secret Manager as `fullsend-{role}-app-pem` — one secret per role, shared across orgs on a mint. Org isolation is enforced via `ALLOWED_ORGS`, `ROLE_APP_IDS`, and installation verification ([ADR 0007](ADRs/0007-per-role-github-apps.md), [ADR 0033](ADRs/0033-per-repo-installation-mode.md)).
 
 One concrete implementation option is [`oidcx`](https://github.com/oxidecomputer/oidcx): a service that accepts OIDC identity tokens and exchanges them for short-lived access tokens. It can mint tokens scoped to selected GitHub repositories and permissions, or to selected Oxide silos and permissions, and it also ships with a GitHub Action wrapper. In a Fullsend deployment, this can be used by the sandbox entrypoint to narrow a broad GitHub App identity down to only the specific permissions an agent needs for the current run.
diff --git a/docs/superpowers/specs/2026-06-02-host-side-api-server-design.md b/docs/superpowers/specs/2026-06-02-host-side-api-server-design.md
deleted file mode 100644
index 216910622..000000000
--- a/docs/superpowers/specs/2026-06-02-host-side-api-server-design.md
+++ /dev/null
@@ -1,434 +0,0 @@
-# ADR 0043 — Host-Side API Server Design — Spec
-
-Tracking issue: [#880](https://github.com/fullsend-ai/fullsend/issues/880)
-Parent issue: [#879](https://github.com/fullsend-ai/fullsend/issues/879)
-Experiment: `experiments/host-side-api-server` ([fullsend-ai/experiments#28](https://github.com/fullsend-ai/experiments/pull/28))
-
-## Purpose
-
-Define the complete design for host-side API servers that run outside the
-OpenShell sandbox and are callable by the agent via HTTP. ADR 0024 introduced
-the `api_servers` harness field as PLANNED, and ADRs 0017/0025 established the
-host-side REST server as Tier 3 of the credential delivery model. This ADR
-fills the remaining design gaps before implementation (#881).
-
-## Decisions to record
-
-### 1. Server process contract and lifecycle
-
-**Context.** The experiment validated a uniform process contract across two
-servers in different languages (Go, Python). The runner needs a
-language-agnostic interface to manage arbitrary API servers.
-
-**Decision.** Every host-side API server managed by the runner must:
-
-- Accept `--port <port>` and `--token <bearer-token>` CLI flags
-- Accept `--bind-address <addr>` (default `127.0.0.1`, see §8 for why the
-  runner overrides to `0.0.0.0` today)
-- Serve `GET /healthz` returning `{"status": "ok"}` when ready (unauthenticated)
-- Serve `GET /tools.json` for agent discovery (see §2)
-- Validate `Authorization: Bearer <token>` on all non-health, non-discovery
-  endpoints
-- Shut down cleanly on `SIGTERM` (5s grace period, then `SIGKILL`)
-
-**Runner lifecycle:**
-
-1. Start declared API servers after pre-script, before sandbox creation
-2. Poll `GET /healthz` until 200 (timeout: 15s, 500ms interval)
-3. Configure sandbox network policy and provider credentials
-4. Create sandbox and start agent
-5. On exit (success or failure): send `SIGTERM` to servers, wait grace period,
-   `SIGKILL` if needed — after sandbox destruction (step 11 of ADR 0024)
-
-**Crash behavior.** If an API server crashes mid-run, the run fails. No restart
-logic. API servers are critical infrastructure — a crashed server means the
-agent has lost access to capabilities it depends on, and continuing would
-produce incomplete or incorrect results.
-
-**Harness schema.** Uses the existing `api_servers` field from ADR 0024,
-keeping the `script` field name:
-
-```yaml
-api_servers:
-  - name: builder
-    script: api-servers/builder/bin/builder-server
-    port: 9090
-    providers:                       # NEW — see §3
-      - builder-build
-      - builder-push
-    env:
-      REGISTRY_TOKEN: ${REGISTRY_TOKEN}
-```
-
-### 2. API discoverability for the agent
-
-**Context.** The experiment compared three approaches for making the API known
-to the agent. Each was tested under full-access and restricted L7 policies.
-
-**Options:**
-
-| | `/tools.json` | `/openapi.json` | Baked instructions |
-|---|---|---|---|
-| Token efficiency (full access) | 92k (best) | 107k | 100k |
-| Token efficiency (restricted) | 205k (fails) | 534k (fails) | 84k (succeeds) |
-| Resilience to blocked discovery | Fails — agent guesses paths | Fails — agent guesses paths | Succeeds — knows paths from skill |
-| Maintainability | Single source of truth in server | Single source of truth in server | Skill can drift from API |
-| Agent parsing | Structured JSON, minimal ambiguity | Verbose nested structure, more context tokens | Prose, more interpretation needed |
-
-**Decision.** Require `GET /tools.json` as the standard discovery endpoint.
-Each entry contains `name`, `description`, `endpoint`, `method`, and
-`input_schema`:
-
-```json
-[
-  {
-    "name": "build_container",
-    "description": "Build a container image using podman or docker",
-    "endpoint": "/build",
-    "method": "POST",
-    "input_schema": {
-      "type": "object",
-      "required": ["tag"],
-      "properties": {
-        "tag": {"type": "string", "description": "Image tag"},
-        "dockerfile": {"type": "string", "default": "Dockerfile"}
-      }
-    }
-  }
-]
-```
-
-**Rationale.**
-
-- `/tools.json` is the most token-efficient under full access (92k vs 100k for
-  baked, 107k for OpenAPI).
-- It returns structured data purpose-built for agent consumption — agents parse
-  JSON directly rather than interpreting Markdown prose or navigating verbose
-  OpenAPI nesting.
-- The schema is a single source of truth in the server. When the API changes,
-  the agent automatically discovers the new schema.
-- OpenAPI is designed for code generators and documentation tools — its nested
-  structure adds context tokens without proportional benefit for LLM agents.
-- Baked instructions are the most resilient to restricted policies (84k, only
-  method that succeeds), but the discovery endpoint should not be blocked in
-  normal operation — it is part of the required process contract.
-
-### 3. Network policy via composable provider profiles
-
-**Context.** The experiment manually authored L7 policies and hit bugs from
-mismatches between server capabilities and policy rules. OpenShell now supports
-provider-backed policy composition
-([NVIDIA/OpenShell#947](https://github.com/NVIDIA/OpenShell/issues/947),
-[NVIDIA/OpenShell#1037](https://github.com/NVIDIA/OpenShell/pull/1037)) where
-attaching a provider to a sandbox auto-injects L7 rules as Layer 2 policy
-entries. Fullsend issue
-[#776](https://github.com/fullsend-ai/fullsend/issues/776) tracks adopting
-this for harness policies.
-
-**How OpenShell policy composition works.** OpenShell has a 3-layer policy
-stack:
-
-- **Layer 1 (Base):** Filesystem, process, landlock — static sandbox config.
-- **Layer 2 (Provider):** Auto-generated from attached providers. Each attached
-  provider contributes network policy rules under a reserved `_provider_*` key.
-- **Layer 3 (User):** Explicit user-authored rules via `openshell policy set`.
-
-A **provider** bundles three things: credentials (with injection style),
-endpoints (L7 rules), and binaries (which executables can use those endpoints).
-A **provider profile** is the template that defines a provider type — the YAML
-schema declaring the endpoints, binaries, and auth configuration. When a
-provider is attached to a sandbox, OpenShell auto-injects its endpoint rules
-into the effective policy.
-
-Composition is **additive**: the proxy permits a request if it matches ANY rule
-across layers (union of allows). Deny rules win globally — if any rule denies
-a request, it's blocked regardless of allows in other rules.
-
-**Options:**
-
-**Option A: Composable provider profiles per capability (recommended).** Each
-API server ships atomic provider profiles — one per logical group of endpoints.
-Each profile bundles the credential injection (`auth` block), the L7 endpoint
-rules, and the binary restrictions as a single `(credential, endpoint, binary)`
-unit. Harnesses list which profiles to attach. Different agent roles compose
-different capability sets for the same server.
-
-```yaml
-# Provider profiles (defined once per API server, registered on gateway):
-#
-# builder-build profile:
-#   endpoints: POST /build, GET /healthz, GET /tools.json
-#   binaries: **/curl
-#   auth: bearer token injection
-#
-# builder-push profile:
-#   endpoints: POST /push
-#   binaries: **/curl
-#   auth: bearer token injection
-#
-# builder-read profile:
-#   endpoints: GET /images
-#   binaries: **/curl
-#   auth: bearer token injection
-
-# Code agent harness — full access:
-api_servers:
-  - name: builder
-    script: api-servers/builder/bin/builder-server
-    port: 9090
-    providers:
-      - builder-build
-      - builder-push
-      - builder-read
-
-# Review agent harness — read-only:
-api_servers:
-  - name: builder
-    script: api-servers/builder/bin/builder-server
-    port: 9090
-    providers:
-      - builder-read
-```
-
-Pros: reusable across harnesses, follows least privilege naturally (compose
-only what the agent needs), aligns with OpenShell's provider model (credential
-+ endpoint + binary as a unit), each profile is defined once and composed
-freely. Cons: requires creating and registering custom provider profiles for
-each API server capability, depends on OpenShell >= v0.0.37 and fullsend #776.
-
-**Option B: Runner-generated monolithic policy.** The runner generates a single
-L7 policy file from an `allowed_paths` list in the harness `api_servers` entry.
-No dependency on OpenShell provider profiles.
-
-```yaml
-api_servers:
-  - name: builder
-    script: api-servers/builder/bin/builder-server
-    port: 9090
-    allowed_paths:
-      - method: POST
-        path: /build
-      - method: GET
-        path: /images
-```
-
-Pros: works with any OpenShell version, no external dependencies, simpler to
-implement. Cons: duplicates policy logic across harnesses, error-prone
-(experiment hit bugs from mismatches between server API surface and manually
-authored policy), no reuse of capability definitions, credential injection must
-be handled separately.
-
-**Decision.** Option A — composable provider profiles per capability.
-
-**Requirements:**
-
-- OpenShell >= v0.0.37 (profile-backed policy composition,
-  [NVIDIA/OpenShell#1037](https://github.com/NVIDIA/OpenShell/pull/1037))
-- The `use_providers_v2` gateway setting may be required (see
-  [#776](https://github.com/fullsend-ai/fullsend/issues/776))
-- Prerequisite: [fullsend-ai/fullsend#776](https://github.com/fullsend-ai/fullsend/issues/776)
-  (adopt provider-backed policy composition)
-
-**Composition semantics.** Composition is additive-only: provider rules and
-user rules live in separate keys, and the proxy permits a request if it matches
-any rule. There is no cross-rule deny mechanism that would let a user policy
-narrow what a provider profile grants (though provider deny rules do block
-globally). Different access levels for the same server are achieved by
-composing different profile sets, not by adding deny overrides.
-
-### 4. Per-run authentication
-
-**Context.** The agent inside the sandbox needs to authenticate to API servers.
-The real credential must never enter the sandbox.
-
-**Options:**
-
-**Option A: UUID bearer token via provider placeholders (recommended).** The
-runner generates a random UUID token per run. The token is registered as an
-OpenShell provider credential with an `auth: bearer` declaration. The L7 proxy
-resolves the placeholder to the real token in outgoing `Authorization` headers
-— the real token never enters the sandbox.
-
-```yaml
-# Provider definition
-name: api-server
-type: generic
-credentials:
-  API_TOKEN: ${API_TOKEN}
-```
-
-Pros: simple, proven by experiment, no key management, credential never enters
-sandbox, credential scoping ensures the token is only injected for requests
-matching the provider's endpoints and binaries. Cons: no claims or expiry —
-the token is valid for the entire run and grants whatever endpoints the L7
-policy allows.
-
-**Option B: Short-lived JWTs with claims.** The runner generates a JWT signed
-with a per-run key pair. Claims include run ID, repo, and allowed operations.
-Servers validate the signature and claims. The JWT can be short-lived (e.g., 1
-hour) with refresh.
-
-Pros: per-operation authorization, expiry, audit trail via claims. Cons: adds
-signing key management, JWT library dependency in every server, more complex
-token lifecycle, and the L7 policy already restricts which endpoints are
-reachable — JWT claims would be a second layer of the same restriction.
-
-**Decision.** Option A — UUID bearer token via provider placeholders. The L7
-policy already enforces which endpoints each agent can reach (§3), making
-per-operation JWT claims redundant for the initial design. JWT-based auth is a
-future enhancement for when per-operation claims become necessary (e.g.,
-multi-tenant servers, cross-run audit).
-
-**Security requirement.** Token comparison must be timing-safe
-(`crypto/subtle.ConstantTimeCompare` in Go, `hmac.compare_digest` in Python).
-The experiment code flagged this as a TODO.
-
-### 5. Credential delivery to the server process
-
-**Context.** API servers hold credentials on behalf of the agent (registry
-tokens, GitHub tokens, API keys). These must reach the server without passing
-through the sandbox.
-
-**Decision.** Credentials are delivered via environment variables expanded from
-the host environment at server startup. The `env` field in `api_servers` (ADR
-0024) supports `${HOST_VAR}` syntax:
-
-```yaml
-api_servers:
-  - name: builder
-    script: api-servers/builder/bin/builder-server
-    port: 9090
-    env:
-      REGISTRY_TOKEN: ${REGISTRY_TOKEN}
-      GCP_KEY_PATH: ${GOOGLE_APPLICATION_CREDENTIALS}
-```
-
-The per-run bearer token is passed via `--token` CLI flag (not through `env`,
-since it's part of the process contract).
-
-No secrets mounts or vault integration in the initial design. The runner
-expands `${VAR}` references against its own environment and passes the resolved
-values to the server process. Sensitive values must not appear in logs or error
-messages — servers must scrub credentials from error output (the experiment's
-provisioner implements `_scrub_credentials` for this).
-
-### 6. File transfer between server and sandbox
-
-**Context.** API servers that build artifacts or provision repos need to
-exchange files with the sandbox. File transfer must happen during request
-handling — the agent calls the API, the server produces or consumes files, and
-the result must be in the sandbox before the response returns. The runner is
-not in this loop.
-
-**Options:**
-
-**Option A: `openshell sandbox upload/download` from the server (recommended).**
-The server shells out to the OpenShell CLI to transfer files during request
-handling. The agent passes its sandbox name per-request (discovered via
-`hostname | sed 's/^sandbox-//'`), and the server uses it with `openshell
-sandbox download <name> <sandbox-path> <local-path>` and `openshell sandbox
-upload <name> <local-path> <sandbox-path>`.
-
-Pros: works today, validated by experiment, handles real-time exchange
-naturally (transfer happens during request handling), no runner mediation
-needed. Cons: couples server to OpenShell CLI — servers need `openshell` on
-`PATH` and can't be tested without it.
-
-**Option B: Shared host mount.** The runner creates a staging directory on the
-host and mounts it into the sandbox via `openshell sandbox create --mount
-<host-path>:<sandbox-path>`. Both the server and the agent see the same
-directory — no explicit transfer needed.
-
-Pros: no transfer commands, transparent POSIX access, no OpenShell CLI
-dependency in the server. Cons: depends on OpenShell mount support (available
-on K3s via
-[NVIDIA/OpenShell#500](https://github.com/NVIDIA/OpenShell/issues/500), pending
-for VM driver via
-[NVIDIA/OpenShell#1509](https://github.com/NVIDIA/OpenShell/issues/1509)),
-bidirectional mounts may introduce TOCTOU risks if both sides write
-concurrently.
-
-**Option C: HTTP multipart via the API.** The agent uploads files to the server
-and downloads results through the server's own REST endpoints using multipart
-form data over the L7 proxy. The server stores files on the host side.
-
-Pros: fully portable, standard HTTP, server is self-contained, no OpenShell
-dependency. Cons: large files go through the L7 proxy (bandwidth/latency
-overhead), requires multipart handling in every server, the proxy must allow
-the content-type and body size.
-
-**Decision.** Option A — `openshell sandbox upload/download` from the server.
-These servers are purpose-built for the OpenShell environment, so the CLI
-coupling is acceptable. The experiment validated this end-to-end with both
-the builder (download context, build, upload tarball) and the provisioner
-(clone, scan, upload repo).
-
-Option B (shared mount) is noted as the preferred future direction when
-OpenShell mount support is universally available — it eliminates transfer
-commands entirely and makes file exchange transparent.
-
-### 7. Provider vs. API server decision framework
-
-**Context.** Issue #196 evaluated providers as a replacement for REST servers.
-Providers (Tier 2) became the preferred path, but API servers (Tier 3) remain
-necessary for cases providers cannot handle.
-
-**Decision.** Use providers (Tier 2) by default. Use API servers (Tier 3) when
-any of the following apply:
-
-| Condition | Why providers can't handle it |
-|---|---|
-| **Long-running operations** (> 60s) | MCP client timeouts (~30-60s) make provider-based tools unsuitable ([claude-code#7575](https://github.com/anthropics/claude-code/issues/7575)) |
-| **Sandbox capability gaps** | Operations the sandbox deliberately blocks (e.g., container builds — seccomp blocks `CLONE_NEWUSER`, `AF_NETLINK`, `setns`; agent has zero Linux capabilities; [NVIDIA/OpenShell#113](https://github.com/NVIDIA/OpenShell/issues/113)) |
-| **Credentials in request bodies** | Provider placeholder model only intercepts `Authorization` headers; credentials embedded in JSON bodies or query parameters require server-side injection |
-| **Response transformation** | Scanning, filtering, or transforming responses before they reach the agent (e.g., repo provisioner's scan-before-copy) |
-| **Multi-step atomic operations** | Operations that combine multiple steps as a single unit (clone + scan + copy) where partial completion would be worse than failure |
-
-### 8. Bind address and network exposure
-
-**Context.** The experiment found that API servers must bind to `0.0.0.0`
-because the L7 proxy connects from inside the container network namespace —
-servers bound to `127.0.0.1` are unreachable. On rootless Podman, the container
-bridge gateway IP (e.g., `10.88.0.1`) lives inside the container namespace and
-cannot be bound from the host (`EADDRNOTAVAIL`).
-
-**Decision.** Servers default to `--bind-address 127.0.0.1` (secure by
-default). The runner explicitly passes `--bind-address 0.0.0.0` when starting
-servers for sandboxed agents. This is a security trade-off: on shared hosts,
-other processes can probe the server ports. Bearer token authentication
-mitigates this but doesn't eliminate the attack surface.
-
-**Future.** [NVIDIA/OpenShell#1633](https://github.com/NVIDIA/OpenShell/issues/1633)
-proposes generalizing the `inference.local` supervisor proxy pattern for
-arbitrary host services. If implemented, the supervisor would proxy connections
-from inside the sandbox to `127.0.0.1` on the host, eliminating the need for
-`0.0.0.0` binding entirely. Servers would never be network-exposed. The runner
-should detect OpenShell support for this and stop passing
-`--bind-address 0.0.0.0` when available.
-
-**Network policy.** All URLs delivered into the sandbox must use
-`host.openshell.internal`, never raw IPs. The L7 proxy matches requests by
-hostname, and SSRF protection blocks private IP addresses. The `allowed_ips`
-field in network policies handles SSRF allowlisting separately using the host
-IP rendered into policy templates at runtime.
-
-### 9. Security hardening requirements
-
-Based on experiment findings and code review of the PoC servers:
-
-| Requirement | Rationale |
-|---|---|
-| **Timing-safe token comparison** | Naive string comparison leaks token length via timing side-channel. Use `crypto/subtle.ConstantTimeCompare` (Go) or `hmac.compare_digest` (Python). |
-| **Request body size limits** | Prevent DoS via oversized payloads. Recommend 1 MB default (`http.MaxBytesReader` in Go, `Content-Length` check in Python). |
-| **Rate limiting on unauthenticated endpoints** | `/healthz`, `/tools.json` are reachable without a bearer token. Rate limit to prevent abuse, especially when bound to `0.0.0.0`. |
-| **Credential scrubbing in error messages** | Error responses must not leak credentials embedded in URLs or environment variables. Servers must scrub before returning errors to the agent. |
-| **Bounded in-memory state** | Servers that track operation state (e.g., provisioner's job map) must bound the state or expire old entries. Unbounded growth is acceptable only for short-lived experiment runs. |
-
-### 10. Relationship to existing ADRs
-
-| ADR | Relationship |
-|---|---|
-| **0016** (Unidirectional control flow) | Preserved — API servers are provisioned top-down from the harness. Agents cannot request servers to be started. The runner manages the full lifecycle. |
-| **0017** (Credential isolation) | Implemented — this ADR specifies the concrete process contract for the host-side REST server model. Per-run bearer tokens via provider placeholders fulfill the "credentials never enter the sandbox" requirement. |
-| **0024** (Harness definitions) | Extended — this ADR specifies runtime behavior for the `api_servers` field, adds `providers` sub-field for composable policy profiles, and inserts API server lifecycle into the execution sequence (after pre-script, before sandbox creation). |
-| **0025** (Provider-based credential delivery) | Tier 3 (host-side REST server) is now fully specified. The decision framework in §7 defines when to use Tier 3 vs. Tier 2 (providers). |

From c828123a5bd69875d63ad581fab947ec5b61eecd Mon Sep 17 00:00:00 2001
From: Marta Anon <manon@redhat.com>
Date: Fri, 12 Jun 2026 10:16:00 +0200
Subject: [PATCH 3/3] =?UTF-8?q?docs:=20refine=20ADR=200046=20=E2=80=94=20s?=
 =?UTF-8?q?cope=20attribution,=20Go=20interface,=20process=20contract?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Distinguish what ADR 0017/0025 originally established (request bodies,
  response transformation) from the broader scope this ADR introduces
- Add implementation model Options subsection: language-agnostic process
  contract vs. Go interface
- Clarify the process contract is the runner's enforcement boundary;
  the Go interface is the recommended internal pattern for
  fullsend-maintained servers, not a runner requirement
- Scope experiment reference — experiment validated the flow, Go
  interface decision came from review discussion
- Update Consequences for Go interface pattern and process contract
  as enforcement boundary

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Marta Anon <manon@redhat.com>
---
 docs/ADRs/0046-host-side-api-server-design.md | 66 +++++++++++++++----
 1 file changed, 52 insertions(+), 14 deletions(-)

diff --git a/docs/ADRs/0046-host-side-api-server-design.md b/docs/ADRs/0046-host-side-api-server-design.md
index ed6c31468..37c30c112 100644
--- a/docs/ADRs/0046-host-side-api-server-design.md
+++ b/docs/ADRs/0046-host-side-api-server-design.md
@@ -28,18 +28,20 @@ Accepted
 [ADR 0024](0024-harness-definitions.md) introduced the `api_servers` harness field as planned but not
 implemented. [ADR 0017](0017-credential-isolation-for-sandboxed-agents.md)/[ADR 0025](0025-provider-credential-delivery-for-sandboxed-agents.md) established the host-side REST server as Tier 3
 of the credential delivery model — for cases where providers (Tier 2) cannot
-handle: long-running operations exceeding MCP timeouts, operations the sandbox
+handle, originally scoped to credentials in request bodies and response
+transformation. Practice revealed additional cases beyond provider reach:
+long-running operations exceeding MCP timeouts, operations the sandbox
 deliberately blocks (container builds, see
 [NVIDIA/OpenShell#113](https://github.com/NVIDIA/OpenShell/issues/113)),
-credentials in request bodies, response transformation, and multi-step atomic
-operations.
+and multi-step atomic operations.
 
 The `host-side-api-server` experiment
 ([fullsend-ai/experiments#28](https://github.com/fullsend-ai/experiments/pull/28))
 validated the end-to-end flow with two servers (Go container builder, Python
 repo provisioner), testing lifecycle management, API discoverability, L7 policy
-tuning, per-run auth, and file transfer. This ADR records the design decisions
-informed by that experiment.
+tuning, per-run auth, and file transfer. This ADR records design decisions
+informed by that experiment and by subsequent review discussion on the
+implementation model.
 
 ## Options
 
@@ -76,22 +78,53 @@ OpenShell mount support that is not yet universally available
 **HTTP multipart via the API:** fully portable, but large files through the L7
 proxy add overhead.
 
+### Implementation model
+
+**Language-agnostic process contract:** each server is an independent binary
+in any language. The runner manages it via CLI flags (`--port`, `--token`,
+`--bind-address`) and expects `/healthz`, `/tools.json`, bearer auth, and
+clean `SIGTERM` shutdown. The experiment validated this with Go and Python
+servers. Maximally flexible, but every server re-implements boilerplate (CLI
+parsing, health endpoints, auth middleware, security hardening).
+
+**Go interface:** fullsend-maintained servers implement an internal Go
+interface, compiled to a single binary. Simplifies deployment (one binary,
+no runtime dependencies) and enables stub-based testing. The process
+contract remains the runner's enforcement boundary — the interface is an
+internal pattern, not a runner requirement. Provides a seam for adapting
+deployment topology across workflow platforms (sidecars, pods).
+
 ## Decision
 
-Adopt the host-side API server design with the following process contract,
-policy model, and security requirements. Full design details in the
-[`host-side-api-server` experiment](https://github.com/fullsend-ai/experiments/pull/28).
+Adopt the host-side API server design with the following implementation model,
+policy model, and security requirements. The
+[`host-side-api-server` experiment](https://github.com/fullsend-ai/experiments/pull/28)
+validated the end-to-end flow; the implementation model was refined during
+review.
 
-**Process contract.** Every host-side API server must accept `--port`,
-`--token`, and `--bind-address` CLI flags; serve `GET /healthz`
+**Process contract.** Every compiled API server binary accepts `--port`,
+`--token`, and `--bind-address` flags; serves `GET /healthz`
 (unauthenticated) and `GET /tools.json` (structured tool-use schema) for agent
-discovery; validate bearer tokens on all other endpoints; and shut down cleanly
-on `SIGTERM`. Servers must write logs to stderr; the runner collects and
+discovery; validates bearer tokens on all other endpoints; and shuts down
+cleanly on `SIGTERM`. Servers write logs to stderr; the runner collects and
 bundles logs from all API servers so they are available for inspection after
 the run completes. The runner starts servers after pre-script, health-checks
 before sandbox creation, and tears down after sandbox destruction. If a server
 crashes mid-run, the run fails.
 
+**Implementation model.** The runner enforces the process contract above — any
+binary that satisfies it is a valid API server. Internally, fullsend-maintained
+servers use a Go interface as the recommended pattern: each server provides
+HTTP handlers and a `/tools.json` schema, and the `main()` wires the
+implementation to the process contract (CLI flags, health endpoints, auth
+middleware, signal handling). Harnesses reference a compiled Go binary, the
+workflow runner downloads it to the host, and `fullsend run` manages its
+lifecycle. The interface makes servers testable via stub implementations and
+provides the seam for adapting deployment topology across workflow platforms
+(e.g., sidecars or separate pods on Kubernetes/OpenShift). Implementors who
+need more control can write their own `main()` against the process contract
+directly.
+
 **Network policy via composable provider profiles.** Each API server ships
 atomic capability profiles — one per logical group of endpoints (e.g.,
 `builder-build`, `builder-push`, `builder-read`). Harnesses list which profiles
@@ -122,8 +155,9 @@ messages, bounded in-memory state.
 ## Consequences
 
 - The `api_servers` harness field ([ADR 0024](0024-harness-definitions.md)) will gain a `providers` sub-field and
-  defined runtime behavior — servers can be implemented in any language
-  following the uniform process contract.
+  defined runtime behavior. The initial implementation targets Go servers
+  behind an internal interface; the process contract keeps the door open for
+  other languages.
 - Implementing this design requires
   [#776](https://github.com/fullsend-ai/fullsend/issues/776) (provider-backed
   policy composition) as a prerequisite.
@@ -135,3 +169,7 @@ messages, bounded in-memory state.
 - API servers (Tier 3) are now clearly scoped to cases providers cannot
   handle: long-running operations, sandbox capability gaps, credentials in
   request bodies, response transformation, and multi-step atomic operations.
+- Fullsend-maintained servers follow a Go interface pattern (testable,
+  platform-portable). The process contract remains the enforcement boundary
+  — servers in other languages or with custom `main()` implementations are
+  valid as long as they satisfy it.