Skip to content

v0.8.58: Codex/Responses client reliability — retry/backoff parity, land function_call_output + xhigh effort fixes #3019

@Hmbown

Description

@Hmbown

Why

The OpenAI Codex provider (openai-codex) is the only lane that bypasses CodeWhale's retry stack: a 429 or transient 5xx from the ChatGPT backend kills the turn immediately, while every Chat Completions provider transparently retries with backoff. This blocks #2984 (promote Codex out of preview) and makes Codex-parity benchmark runs flaky. Three correctness fixes for this path already exist as uncommitted work on branch v0.8.58-constitution and must be anchored: user-role ToolResult blocks were silently dropped (breaking all multi-turn tool calling on Codex), tool schemas went to the wire unsanitized, and effort "maximum" was not mapped to xhigh. Related: #2955 (Codex usage telemetry).

Current state

  • crates/tui/src/client/responses.rs:80-113handle_responses_stream builds the request and calls raw builder.json(&body).send() (lines 103-107), then bails on any non-2xx:
    anyhow::bail!("Responses API error (HTTP {status}): {raw}") — no retry, no Retry-After, no retry_status banner, no mark_request_failure circuit-breaker accounting.
  • crates/tui/src/client/chat.rs:268 — the chat stream path goes through self.send_with_retry(|| self.http_client.post(&url).json(&body)).
  • crates/tui/src/client.rs:1019-1083send_with_retry wraps with_retry (crates/tui/src/llm_client/mod.rs:1002), honors Retry-After via extract_retry_after (llm_client/mod.rs:1127), sanitizes error bodies with sanitize_http_error_body, drives the crate::retry_status banner, and calls wait_for_rate_limit / mark_request_success / mark_request_failure.
  • Dispatch: crates/tui/src/client.rs:1141-1142 and 1151-1152 route ApiProvider::OpenaiCodex to handle_responses_message / handle_responses_stream.
  • Endpoint: CODEX_RESPONSES_PATH = "/codex/responses" (responses.rs:24), appended to self.base_url. RequestPayloadMode (crates/tui/src/config.rs:377-381) has only a ChatCompletions variant today.
  • In-flight fixes (uncommitted in the local working tree; if you are on a fresh clone where codex_responses_reasoning_effort does not exist in responses.rs, recreate them per the Plan):
    1. convert_messages_to_responses_input (responses.rs:483): a ContentBlock::ToolResult arm in the "user" role branch (responses.rs:504-523) that flushes pending input_text items and emits {"type": "function_call_output", "call_id": ..., "output": ...} via parse_tool_use_id. Previously user-role tool results hit _ => {} and were dropped, so the model never saw tool outputs.
    2. tool_to_responses_function (responses.rs:597-607): clones tool.input_schema and runs schema_sanitize::sanitize_for_responses (crates/tui/src/tools/schema_sanitize.rs:85) before sending parameters.
    3. codex_responses_reasoning_effort (responses.rs:609-618): extracted helper mapping "xhigh" | "max" | "maximum"xhigh, "off"/"none"/... → omit reasoning, unknown → medium.
    4. Unit tests for all three in responses.rs #[cfg(test)] mod tests (responses.rs:656-754).

Plan

  1. Land the in-flight fixes (crates/tui/src/client/responses.rs): if the working tree already contains codex_responses_reasoning_effort, the ToolResult arm in the user-role branch, the sanitize_for_responses call, and the three tests listed above, commit them as-is. Otherwise implement them exactly as described in Current state items 1-4. Requires the schema-sanitize-strict-mode issue's sanitize_for_responses to exist.
  2. Add retry/backoff to handle_responses_stream (responses.rs:80-113): replace the one-shot builder...send() with self.send_with_retry(|| { ... }), building the full request (POST url, Content-Type, Accept: text/event-stream, OpenAI-Beta: responses=experimental, originator: codex_cli_rs, optional chatgpt-account-id from crate::oauth::codex_account_id(), .json(&body)) inside the closure so each retry attempt gets a fresh RequestBuilder. send_with_retry is pub(super) in crates/tui/src/client.rs:1019, so it is callable from responses.rs (same module tree) with no visibility change.
  3. Error mapping parity (responses.rs): keep a defensive non-success check after send_with_retry returns (mirroring chat.rs:271-286), but run the body through sanitize_http_error_body(Some(self.api_provider.display_name()), status.as_u16(), &raw) before bailing so terminal 4xx errors render like the chat path. Import what is needed from super.
  4. Tests — must be in-src unit tests (the #[cfg(test)] module of responses.rs or client.rs): crates/tui is a bin-only crate (crates/tui/Cargo.toml has only a [[bin]] target; mod client is private in crates/tui/src/main.rs:23), so a crates/tui/tests/ integration test CANNOT reach DeepSeekClient or handle_responses_stream. Use wiremock under #[tokio::test] (dev-dep, crates/tui/Cargo.toml:78; in-src wiremock pattern: crates/tui/src/tools/web_search.rs:2365-2375) to mount POST /codex/responses returning 429 with a Retry-After: 0 header once, then a 200 SSE body; assert handle_responses_stream succeeds after the retry, and that a 400 surfaces without retry. Client setup: DeepSeekClient::new (client.rs:582) with a Config whose provider is openai-codex and whose [providers.openai_codex] base_url points at the mock (deepseek_base_url honors the provider override, config.rs:2441-2460; http://127.0.0.1 passes validate_base_url_security, client.rs:358). Note: deepseek_api_key() for this provider calls oauth::get_credentials() (config.rs:2610-2612), which honors a CODEX_ACCESS_TOKEN env override (crates/tui/src/oauth.rs:262) — set a dummy token in the test using a mutex + save/restore env guard like the base-url tests do (client.rs:3583-3614, ALLOW_INSECURE_HTTP_ENV_LOCK pattern).
  5. Stretch (optional, do NOT block landing): add RequestPayloadMode::Responses to crates/tui/src/config.rs:377-381 and make the Responses path usable against standard api.openai.com /v1/responses instead of the hardwired CODEX_RESPONSES_PATH. If attempted, run python3 scripts/check-provider-registry.py. Skip entirely if it grows beyond ~100 lines.
  6. Reference this issue and OpenAI Codex provider: verify live round-trip and promote out of preview #2984 in the landing commit message.

Acceptance criteria

  • On openai-codex, a multi-turn tool-calling conversation serializes user-role ToolResult blocks as function_call_output input items (covered by responses_input_includes_user_role_tool_results).
  • Tool parameters sent to /codex/responses are sanitized (oneOf/anyOf/allOf/enum/not stripped at the root) without mutating the original Tool (covered by responses_function_tool_sanitizes_root_composition_schema).
  • reasoning_effort: "maximum" maps to xhigh; "off"/"none" omit the reasoning block (covered by codex_reasoning_effort_uses_responses_labels).
  • A 429 (with Retry-After) or 5xx from /codex/responses is retried with backoff and succeeds when the backend recovers, instead of failing the turn on the first response.
  • Retries surface in the retry_status banner and in logging::warn ("HTTP retry reason=...") exactly like chat-path retries; client failure/success accounting (mark_request_failure/mark_request_success) fires for the Responses path.
  • Non-retryable errors (400/401/403) fail fast with a sanitized error body that names the provider, matching chat-path formatting.
  • All new/changed tests pass; no behavior change for non-Codex providers.

Verification

cargo test -p codewhale-tui client::responses
cargo test -p codewhale-tui schema_sanitize
cargo test -p codewhale-tui --test tool_lifecycle_acceptance   # guard: existing mock-LLM flows still pass
cargo fmt --all -- --check
cargo clippy -p codewhale-tui --all-targets -- -D warnings
python3 scripts/check-provider-registry.py   # only if step 5 (stretch) touched config.rs provider data

Manual (requires Codex OAuth creds in ~/.codex/auth.json via codex login; skip if headless without creds):

  1. cargo run -p codewhale-tui → switch provider to openai-codex.
  2. Ask for a task that forces two sequential tool calls (e.g. "read Cargo.toml, then list crates/"). Both tool results must round-trip — before fix 1, turn 2 stalls or the backend 400s on a function_call without matching output.
  3. Retry path without creds: run the wiremock test from Plan step 4 (it sets a dummy CODEX_ACCESS_TOKEN itself; no real login needed) and confirm the log line HTTP retry reason=rate_limited.

Out of scope

Hints

  • send_with_retry takes FnMut() -> reqwest::RequestBuilder and re-invokes the closure per attempt — build ALL headers inside the closure; do not move a consumed builder in.
  • The bearer Authorization header is a default header on self.http_client (see comment at responses.rs:87-91) — do not add it again.
  • After send_with_retry returns Ok, the status is always 2xx (client.rs:1035-1036); the post-call status check is belt-and-braces parity with chat.rs:271, not load-bearing.
  • Retry classification lives in crates/tui/src/llm_client/mod.rs (with_retry:1002, extract_retry_after:1127, RetryConfig tests at :1155); nothing there should need changes.
  • Existing tests in responses.rs show the MessageRequest/Tool construction shape — copy those rather than building fixtures from scratch.
  • Env mutation in tests is process-global: take a static Mutex lock and restore prior values on drop, exactly as client.rs:3583-3614 does for ALLOW_INSECURE_HTTP_ENV — otherwise the new test will flake against parallel tests.
  • Composite tool-use ids are "{call_id}|{item_id}" (parse_tool_use_id, responses.rs:622); always emit call_id (left half) in function_call_output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent-readyBody is self-sufficient per docs/AGENT_RUNNER.md; a remote agent may claim itbugSomething isn't workingenhancementNew feature or requestv0.8.58Targeting v0.8.58

    Projects

    Status
    In progress

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions