feat(api-rs): OpenAI Responses-compatible /v1/responses ingress by tarrencev · Pull Request #707 · paradigmxyz/centaur

tarrencev · 2026-06-23T04:41:10Z

Summary

Expose Centaur's durable, thread-keyed session control plane behind an OpenAI Responses-compatible POST /v1/responses — the wire API the Codex CLI speaks — so codex (via a custom model_provider base_url) or any Responses client can drive a fully-configured Centaur thread (tools, sandbox, durability).

The Codex analog of the Anthropic /v1/messages ingress. Stacked on the Anthropic ingress PR (it reuses the session-runtime helpers and ExecuteSessionInput fields that PR exposes) — review the /v1/responses commit here; rebase to main once the Anthropic PR merges.

What it does

Endpoint: POST /v1/responses (OpenAI Responses shape); streaming SSE + non-streaming.
Defaults to HarnessType::Codex — the Codex wrapper speaks the same wire format the client expects and uses the deployment's configured Codex model backend.
Thread continuity: keyed on Codex's session id — the session-id header (equal to prompt_cache_key, stable across codex resume) → thread api:codex:<session-id>, so a resumed CLI session continues the same durable thread and warm sandbox. Absent → a fresh api:<uuid> thread.
A ResponsesTranslator maps harness output to Responses streaming events (response.created → output_item.added/content_part.added → output_text.delta → output_text.done/output_item.done → response.completed), reusing the same harness-output parsing as the Anthropic translator (claude content blocks; codex dotted and slash-method events). The terminal usage carries total_tokens, which the Codex CLI requires.
Centaur owns the harness/model/persona/tools, so request model, instructions and tools are accepted and model is echoed, but not threaded into the harness. v1 drives the turn off the last user message in input.

Validation

End-to-end against a live deployment with the codex CLI (custom model_provider → the ingress):

basic reply — codex exec "Reply PONG" → PONG;
tool use — codex exec "run echo …" → the inner agent runs the command and returns its output;
streaming + non-streaming verified directly against /v1/responses.

cargo fmt --check · clippy -D warnings · cargo test green (Responses translator unit cases for both harness shapes).

Follow-ups

Honor a client-selected model/instructions via a Codex-harness mapping.
Reasoning items, client-side function_call round-trips, real usage tokens, x-api-key auth, full-history replay.

Expose centaur's durable, thread-keyed session control plane behind an Anthropic Messages-compatible POST /v1/messages so any Anthropic SDK client or `claude -p` (via ANTHROPIC_BASE_URL) can drive a fully-configured centaur thread (tools, sandbox, durability). - Thread continuity via X-Centaur-Thread-Key (absent -> api:<uuid>). - Honors `model` and `system` (first-request-wins), threading them into the sandbox harness env (CLAUDE_MODEL, CENTAUR_EXTRA_SYSTEM_PROMPT). - Streaming (SSE) and non-streaming responses via an AnthropicTranslator that maps harness output to Anthropic content blocks. Handles both harness event shapes: claude (Anthropic-shaped) and codex, including the deployed Rust harness-server's slash-method/params events (item/agentMessage/delta, item/reasoning/textDelta, item/completed). - Drives the turn off the last user message so clients (e.g. Claude Code) that append a trailing system-role message in messages[] are handled. Validated end-to-end against the c7e deployment with `claude -p`: basic replies, tool use (inner agent runs shell commands), and model selection (requested model lands as CLAUDE_MODEL in the spawned sandbox).

Expose Centaur's durable, thread-keyed session control plane behind an OpenAI Responses-compatible POST /v1/responses, the wire API the Codex CLI speaks, so `codex` (via a custom model_provider base_url) or any Responses client can drive a fully-configured Centaur thread. Mirrors the Anthropic /v1/messages ingress: same SessionRuntime flow, thread continuity via X-Centaur-Thread-Key, model + instructions (system) threaded into the sandbox harness env, tools accepted but decorative (Centaur owns in-sandbox tools). A ResponsesTranslator maps harness output to Responses streaming events (response.created -> output_item/content_part -> output_text.delta -> completed), handling both harness shapes (claude content blocks; codex dotted and slash-method events). The terminal usage carries total_tokens, which the Codex CLI requires. Drives the turn off the last user message in `input`.

tarrencev force-pushed the feat/openai-responses-ingress branch 2 times, most recently from e07b4d2 to f801f21 Compare June 23, 2026 17:43

tarrencev added 2 commits June 23, 2026 14:16

tarrencev force-pushed the feat/openai-responses-ingress branch from f801f21 to 9c5d7b3 Compare June 23, 2026 18:16

goksu mentioned this pull request Jun 23, 2026

fix: clean up session sandboxes #725

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api-rs): OpenAI Responses-compatible /v1/responses ingress#707

feat(api-rs): OpenAI Responses-compatible /v1/responses ingress#707
tarrencev wants to merge 2 commits into
paradigmxyz:mainfrom
tarrencev:feat/openai-responses-ingress

tarrencev commented Jun 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tarrencev commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What it does

Validation

Follow-ups

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tarrencev commented Jun 23, 2026 •

edited

Loading