Skip to content

feat(api-rs): OpenAI Responses-compatible /v1/responses ingress#707

Draft
tarrencev wants to merge 2 commits into
paradigmxyz:mainfrom
tarrencev:feat/openai-responses-ingress
Draft

feat(api-rs): OpenAI Responses-compatible /v1/responses ingress#707
tarrencev wants to merge 2 commits into
paradigmxyz:mainfrom
tarrencev:feat/openai-responses-ingress

Conversation

@tarrencev

@tarrencev tarrencev commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Expose Centaur's durable, thread-keyed session control plane behind an OpenAI Responses-compatible POST /v1/responses — the wire API the Codex CLI speaks — so codex (via a custom model_provider base_url) or any Responses client can drive a fully-configured Centaur thread (tools, sandbox, durability).

The Codex analog of the Anthropic /v1/messages ingress. Stacked on the Anthropic ingress PR (it reuses the session-runtime helpers and ExecuteSessionInput fields that PR exposes) — review the /v1/responses commit here; rebase to main once the Anthropic PR merges.

What it does

  • Endpoint: POST /v1/responses (OpenAI Responses shape); streaming SSE + non-streaming.
  • Defaults to HarnessType::Codex — the Codex wrapper speaks the same wire format the client expects and uses the deployment's configured Codex model backend.
  • Thread continuity: keyed on Codex's session id — the session-id header (equal to prompt_cache_key, stable across codex resume) → thread api:codex:<session-id>, so a resumed CLI session continues the same durable thread and warm sandbox. Absent → a fresh api:<uuid> thread.
  • A ResponsesTranslator maps harness output to Responses streaming events (response.createdoutput_item.added/content_part.addedoutput_text.deltaoutput_text.done/output_item.doneresponse.completed), reusing the same harness-output parsing as the Anthropic translator (claude content blocks; codex dotted and slash-method events). The terminal usage carries total_tokens, which the Codex CLI requires.
  • Centaur owns the harness/model/persona/tools, so request model, instructions and tools are accepted and model is echoed, but not threaded into the harness. v1 drives the turn off the last user message in input.

Validation

End-to-end against a live deployment with the codex CLI (custom model_provider → the ingress):

  • basic reply — codex exec "Reply PONG"PONG;
  • tool usecodex exec "run echo …" → the inner agent runs the command and returns its output;
  • streaming + non-streaming verified directly against /v1/responses.

cargo fmt --check · clippy -D warnings · cargo test green (Responses translator unit cases for both harness shapes).

Follow-ups

  • Honor a client-selected model/instructions via a Codex-harness mapping.
  • Reasoning items, client-side function_call round-trips, real usage tokens, x-api-key auth, full-history replay.

@tarrencev tarrencev force-pushed the feat/openai-responses-ingress branch 2 times, most recently from e07b4d2 to f801f21 Compare June 23, 2026 17:43
Expose centaur's durable, thread-keyed session control plane behind an
Anthropic Messages-compatible POST /v1/messages so any Anthropic SDK client
or `claude -p` (via ANTHROPIC_BASE_URL) can drive a fully-configured centaur
thread (tools, sandbox, durability).

- Thread continuity via X-Centaur-Thread-Key (absent -> api:<uuid>).
- Honors `model` and `system` (first-request-wins), threading them into the
  sandbox harness env (CLAUDE_MODEL, CENTAUR_EXTRA_SYSTEM_PROMPT).
- Streaming (SSE) and non-streaming responses via an AnthropicTranslator that
  maps harness output to Anthropic content blocks. Handles both harness event
  shapes: claude (Anthropic-shaped) and codex, including the deployed Rust
  harness-server's slash-method/params events (item/agentMessage/delta,
  item/reasoning/textDelta, item/completed).
- Drives the turn off the last user message so clients (e.g. Claude Code) that
  append a trailing system-role message in messages[] are handled.

Validated end-to-end against the c7e deployment with `claude -p`: basic
replies, tool use (inner agent runs shell commands), and model selection
(requested model lands as CLAUDE_MODEL in the spawned sandbox).
Expose Centaur's durable, thread-keyed session control plane behind an OpenAI
Responses-compatible POST /v1/responses, the wire API the Codex CLI speaks, so
`codex` (via a custom model_provider base_url) or any Responses client can drive
a fully-configured Centaur thread.

Mirrors the Anthropic /v1/messages ingress: same SessionRuntime flow, thread
continuity via X-Centaur-Thread-Key, model + instructions (system) threaded into
the sandbox harness env, tools accepted but decorative (Centaur owns in-sandbox
tools). A ResponsesTranslator maps harness output to Responses streaming events
(response.created -> output_item/content_part -> output_text.delta -> completed),
handling both harness shapes (claude content blocks; codex dotted and
slash-method events). The terminal usage carries total_tokens, which the Codex
CLI requires. Drives the turn off the last user message in `input`.
@tarrencev tarrencev force-pushed the feat/openai-responses-ingress branch from f801f21 to 9c5d7b3 Compare June 23, 2026 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant