Skip to content

[ACP on Cloud] Epic: ACP agents (Claude Code / Codex / Gemini) in Docker + cloud #988

@simonrosenberg

Description

@simonrosenberg

Epic / tracker. Make ACP agents (Claude Code, Codex, Gemini) work when the agent-server runs outside the user's machine — first containerized (Docker), then cloud. Today ACP only works against a local agent-server because the subprocess silently inherits the user's machine (shell env + CLI login files ~/.claude, ~/.codex/auth.json, ~/.gemini/), both of which vanish in a container/pod. Details live in the sub-issues; this issue holds shared context + the ordered task list.

Key finding — transport already works

At the SDK layer ACP rides the same REST+WebSocket conversation transport as the regular agent, with the CLI as a co-located subprocess inside the agent-server (proven by examples/02_remote_agent_server/09_acp_agent_with_remote_runtime.py). The cloud proxy / exposed_urls / session_api_key are agent-agnostic. The real work is: carry the agent spec, inject credentials, persist state, plus onboarding UX — and a set of pre-cloud refactors that de-risk all of it.

Guiding principle

Stop treating ACP as a special case. ACP today runs parallel channels (secrets, request-building, event fan-out, persistence, model-state, type-detection) for things the regular agent does through one canonical path. Every strongest move collapses an ACP-only channel onto the regular agent's canonical one — which is the channel that already crosses the process/cipher/network boundary. ACP has no production users → no backward-compat, so refactors are deletes, not shims.

Shared reference — credential matrix (verified against CLI sources/binaries)

Provider API key Subscription Inject into container?
Claude Code ANTHROPIC_API_KEY CLAUDE_CODE_OAUTH_TOKEN env (from claude setup-token) ✅ env-var secret. ⚠️ _ENV_CONFLICT_MAP only guards the CLAUDE_CONFIG_DIR path, not the token — don't co-set ANTHROPIC_BASE_URL (silently breaks bearer auth); Max/Pro desktop token is in the macOS Keychain → cloud = env-token only
Codex OPENAI_API_KEY $CODEX_HOME/auth.json — interactive login, rewritten on refresh ⚠️ file materialisation into a writable CODEX_HOME (detection hardcodes ~/.codex/auth.json, acp_agent.py:234; fixed in #1020)
Gemini GEMINI_API_KEY personal OAuth file is AES-GCM bound to hostname+username ❌ personal not deployable; only Vertex SA file

Shared reference — persistence reality

/workspace is a per-runtime PVC that survives pause/resume only: STOP / last-conversation-delete deletes it (runtime-api/k8s.py:488), a reaper kills it at a 14-day creation-time TTL (prod/staging), eval stops on idle, reclaimPolicy: Delete. HOME=/home/openhands is NOT on the PVC → the CLIs' ~/.claude, ~/.codex, ~/.gemini state is ephemeral. ⇒ the per-conversation CLI data root must live under the durable conversations tree (/workspace/conversations/{id.hex}/), and "resume beyond 14 days / across stop" needs a Retain volume. (Details: #1018.)


Tasks — implementation order

Unlabelled = required (P0). P1/P2 = optional / can follow. Order is dependency-driven (no hard deadlines).

Phase 0 — Pre-cloud refactors (de-risk before building cloud)

Phase 1 — SDK foundation

Phase 2 — Persistence & isolation

Phase 3 — Docker (first deployable target)

Phase 4 — Cloud


Non-goals / accepted divergences (do NOT "fix" — alignment would fight the design)

  • Don't make ACP secret injection lazy/per-command like bash — the subprocess is a black-box CLI; creds must be present from spawn. Mitigate with masking ([ACP on Cloud] Refactor: mask injected secrets in ACP tool-call output #1023) + a minimal injected set.
  • Don't remap ACPToolCallEvent→Action/Observation wholesale — ACP is deliberately non-LLM-convertible and streams N updates.
  • Don't touch on_token streaming — already unified with stream=True LLMs via StreamingDeltaEvent (event_service.py:706).
  • Don't touch the disabledByAcp nav or the cloud model-picker gate — inherent ACP semantics (the CLI owns its own LLM/condenser/MCP) / symmetric with the regular switchProfile local-only throw.

Related SDK PRs

#3343 (resume-session-id → #1018) · #3436 (acp_env→SecretStr — merged) · #3458 (MCP forwarding — tangential, not required for this epic).

Metadata

Metadata

Labels

acpACP Agents

Type

No type
No fields configured for issues without a type.

Projects

Status
Now

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions