feat(api-rs): OpenAI Responses-compatible /v1/responses ingress#707
Draft
tarrencev wants to merge 2 commits into
Draft
feat(api-rs): OpenAI Responses-compatible /v1/responses ingress#707tarrencev wants to merge 2 commits into
tarrencev wants to merge 2 commits into
Conversation
e07b4d2 to
f801f21
Compare
Expose centaur's durable, thread-keyed session control plane behind an Anthropic Messages-compatible POST /v1/messages so any Anthropic SDK client or `claude -p` (via ANTHROPIC_BASE_URL) can drive a fully-configured centaur thread (tools, sandbox, durability). - Thread continuity via X-Centaur-Thread-Key (absent -> api:<uuid>). - Honors `model` and `system` (first-request-wins), threading them into the sandbox harness env (CLAUDE_MODEL, CENTAUR_EXTRA_SYSTEM_PROMPT). - Streaming (SSE) and non-streaming responses via an AnthropicTranslator that maps harness output to Anthropic content blocks. Handles both harness event shapes: claude (Anthropic-shaped) and codex, including the deployed Rust harness-server's slash-method/params events (item/agentMessage/delta, item/reasoning/textDelta, item/completed). - Drives the turn off the last user message so clients (e.g. Claude Code) that append a trailing system-role message in messages[] are handled. Validated end-to-end against the c7e deployment with `claude -p`: basic replies, tool use (inner agent runs shell commands), and model selection (requested model lands as CLAUDE_MODEL in the spawned sandbox).
Expose Centaur's durable, thread-keyed session control plane behind an OpenAI Responses-compatible POST /v1/responses, the wire API the Codex CLI speaks, so `codex` (via a custom model_provider base_url) or any Responses client can drive a fully-configured Centaur thread. Mirrors the Anthropic /v1/messages ingress: same SessionRuntime flow, thread continuity via X-Centaur-Thread-Key, model + instructions (system) threaded into the sandbox harness env, tools accepted but decorative (Centaur owns in-sandbox tools). A ResponsesTranslator maps harness output to Responses streaming events (response.created -> output_item/content_part -> output_text.delta -> completed), handling both harness shapes (claude content blocks; codex dotted and slash-method events). The terminal usage carries total_tokens, which the Codex CLI requires. Drives the turn off the last user message in `input`.
f801f21 to
9c5d7b3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Expose Centaur's durable, thread-keyed session control plane behind an OpenAI Responses-compatible
POST /v1/responses— the wire API the Codex CLI speaks — socodex(via a custommodel_providerbase_url) or any Responses client can drive a fully-configured Centaur thread (tools, sandbox, durability).The Codex analog of the Anthropic
/v1/messagesingress. Stacked on the Anthropic ingress PR (it reuses the session-runtime helpers andExecuteSessionInputfields that PR exposes) — review the/v1/responsescommit here; rebase tomainonce the Anthropic PR merges.What it does
POST /v1/responses(OpenAI Responses shape); streaming SSE + non-streaming.HarnessType::Codex— the Codex wrapper speaks the same wire format the client expects and uses the deployment's configured Codex model backend.session-idheader (equal toprompt_cache_key, stable acrosscodex resume) → threadapi:codex:<session-id>, so a resumed CLI session continues the same durable thread and warm sandbox. Absent → a freshapi:<uuid>thread.ResponsesTranslatormaps harness output to Responses streaming events (response.created→output_item.added/content_part.added→output_text.delta→output_text.done/output_item.done→response.completed), reusing the same harness-output parsing as the Anthropic translator (claude content blocks; codex dotted and slash-method events). The terminalusagecarriestotal_tokens, which the Codex CLI requires.model,instructionsandtoolsare accepted andmodelis echoed, but not threaded into the harness. v1 drives the turn off the lastusermessage ininput.Validation
End-to-end against a live deployment with the
codexCLI (custommodel_provider→ the ingress):codex exec "Reply PONG"→PONG;codex exec "run echo …"→ the inner agent runs the command and returns its output;/v1/responses.cargo fmt --check·clippy -D warnings·cargo testgreen (Responses translator unit cases for both harness shapes).Follow-ups
function_callround-trips, real usage tokens,x-api-keyauth, full-history replay.