You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Epic / tracker. Make ACP agents (Claude Code, Codex, Gemini) work when the agent-server runs outside the user's machine — first containerized (Docker), then cloud. Today ACP only works against a local agent-server because the subprocess silently inherits the user's machine (shell env + CLI login files ~/.claude, ~/.codex/auth.json, ~/.gemini/), both of which vanish in a container/pod. Details live in the sub-issues; this issue holds shared context + the ordered task list.
Key finding — transport already works
At the SDK layer ACP rides the same REST+WebSocket conversation transport as the regular agent, with the CLI as a co-located subprocess inside the agent-server (proven by examples/02_remote_agent_server/09_acp_agent_with_remote_runtime.py). The cloud proxy / exposed_urls / session_api_key are agent-agnostic. The real work is: carry the agent spec, inject credentials, persist state, plus onboarding UX — and a set of pre-cloud refactors that de-risk all of it.
Guiding principle
Stop treating ACP as a special case. ACP today runs parallel channels (secrets, request-building, event fan-out, persistence, model-state, type-detection) for things the regular agent does through one canonical path. Every strongest move collapses an ACP-only channel onto the regular agent's canonical one — which is the channel that already crosses the process/cipher/network boundary. ACP has no production users → no backward-compat, so refactors are deletes, not shims.
Shared reference — credential matrix (verified against CLI sources/binaries)
Provider
API key
Subscription
Inject into container?
Claude Code
ANTHROPIC_API_KEY
CLAUDE_CODE_OAUTH_TOKEN env (from claude setup-token)
✅ env-var secret. ⚠️_ENV_CONFLICT_MAP only guards the CLAUDE_CONFIG_DIR path, not the token — don't co-set ANTHROPIC_BASE_URL (silently breaks bearer auth); Max/Pro desktop token is in the macOS Keychain → cloud = env-token only
Codex
OPENAI_API_KEY
$CODEX_HOME/auth.json — interactive login, rewritten on refresh
⚠️ file materialisation into a writable CODEX_HOME (detection hardcodes ~/.codex/auth.json, acp_agent.py:234; fixed in #1020)
Gemini
GEMINI_API_KEY
personal OAuth file is AES-GCM bound to hostname+username
❌ personal not deployable; only Vertex SA file
Shared reference — persistence reality
/workspace is a per-runtime PVC that survives pause/resume only: STOP / last-conversation-delete deletes it (runtime-api/k8s.py:488), a reaper kills it at a 14-day creation-time TTL (prod/staging), eval stops on idle, reclaimPolicy: Delete. HOME=/home/openhands is NOT on the PVC → the CLIs' ~/.claude, ~/.codex, ~/.gemini state is ephemeral. ⇒ the per-conversation CLI data root must live under the durable conversations tree (/workspace/conversations/{id.hex}/), and "resume beyond 14 days / across stop" needs a Retain volume. (Details: #1018.)
Tasks — implementation order
Unlabelled = required (P0). P1/P2 = optional / can follow. Order is dependency-driven (no hard deadlines).
Phase 0 — Pre-cloud refactors (de-risk before building cloud)
Don't remap ACPToolCallEvent→Action/Observation wholesale — ACP is deliberately non-LLM-convertible and streams N updates.
Don't touch on_token streaming — already unified with stream=True LLMs via StreamingDeltaEvent (event_service.py:706).
Don't touch the disabledByAcp nav or the cloud model-picker gate — inherent ACP semantics (the CLI owns its own LLM/condenser/MCP) / symmetric with the regular switchProfile local-only throw.
Related SDK PRs
#3343 (resume-session-id → #1018) · #3436 (acp_env→SecretStr — merged) · #3458 (MCP forwarding — tangential, not required for this epic).
Key finding — transport already works
At the SDK layer ACP rides the same REST+WebSocket conversation transport as the regular agent, with the CLI as a co-located subprocess inside the agent-server (proven by
examples/02_remote_agent_server/09_acp_agent_with_remote_runtime.py). The cloud proxy /exposed_urls/session_api_keyare agent-agnostic. The real work is: carry the agent spec, inject credentials, persist state, plus onboarding UX — and a set of pre-cloud refactors that de-risk all of it.Guiding principle
Stop treating ACP as a special case. ACP today runs parallel channels (secrets, request-building, event fan-out, persistence, model-state, type-detection) for things the regular agent does through one canonical path. Every strongest move collapses an ACP-only channel onto the regular agent's canonical one — which is the channel that already crosses the process/cipher/network boundary. ACP has no production users → no backward-compat, so refactors are deletes, not shims.
Shared reference — credential matrix (verified against CLI sources/binaries)
ANTHROPIC_API_KEYCLAUDE_CODE_OAUTH_TOKENenv (fromclaude setup-token)_ENV_CONFLICT_MAPonly guards theCLAUDE_CONFIG_DIRpath, not the token — don't co-setANTHROPIC_BASE_URL(silently breaks bearer auth); Max/Pro desktop token is in the macOS Keychain → cloud = env-token onlyOPENAI_API_KEY$CODEX_HOME/auth.json— interactive login, rewritten on refreshCODEX_HOME(detection hardcodes~/.codex/auth.json,acp_agent.py:234; fixed in #1020)GEMINI_API_KEYhostname+usernameShared reference — persistence reality
/workspaceis a per-runtime PVC that survives pause/resume only: STOP / last-conversation-delete deletes it (runtime-api/k8s.py:488), a reaper kills it at a 14-day creation-time TTL (prod/staging), eval stops on idle,reclaimPolicy: Delete.HOME=/home/openhandsis NOT on the PVC → the CLIs'~/.claude,~/.codex,~/.geministate is ephemeral. ⇒ the per-conversation CLI data root must live under the durable conversations tree (/workspace/conversations/{id.hex}/), and "resume beyond 14 days / across stop" needs aRetainvolume. (Details: #1018.)Tasks — implementation order
Unlabelled = required (P0). P1/P2 = optional / can follow. Order is dependency-driven (no hard deadlines).
Phase 0 — Pre-cloud refactors (de-risk before building cloud)
state.secret_registry(the anchor; SDK-core — the backend builder collapse already landed on main)P1Decouple ACP tool-call streaming (live deltas) from persistence — O(n²)→O(1), streaming-ready (before real cloud load)P1/P2ACP↔regular-agent alignment cleanups (first-classacp_server/agent_kinddetection; capability props; init spine)Phase 1 — SDK foundation
auth.json, Gemini Vertex SA) → persisted per-conv root, readssecret_registry,CODEX_HOME-aware detectionPhase 2 — Persistence & isolation
Phase 3 — Docker (first deployable target)
Phase 4 — Cloud
agent_settingsat creation)ACPAgent+ inject per-provider credentials (encrypted store + cipher; bump the runtime image pin)P1Enable ACP model switching in cloudNon-goals / accepted divergences (do NOT "fix" — alignment would fight the design)
ACPToolCallEvent→Action/Observation wholesale — ACP is deliberately non-LLM-convertible and streams N updates.on_tokenstreaming — already unified withstream=TrueLLMs viaStreamingDeltaEvent(event_service.py:706).disabledByAcpnav or the cloud model-picker gate — inherent ACP semantics (the CLI owns its own LLM/condenser/MCP) / symmetric with the regularswitchProfilelocal-only throw.Related SDK PRs
#3343 (resume-session-id → #1018) · #3436 (
acp_env→SecretStr — merged) · #3458 (MCP forwarding — tangential, not required for this epic).