You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The OpenAI Codex provider (openai-codex) is the only lane that bypasses CodeWhale's retry stack: a 429 or transient 5xx from the ChatGPT backend kills the turn immediately, while every Chat Completions provider transparently retries with backoff. This blocks #2984 (promote Codex out of preview) and makes Codex-parity benchmark runs flaky. Three correctness fixes for this path already exist as uncommitted work on branch v0.8.58-constitution and must be anchored: user-role ToolResult blocks were silently dropped (breaking all multi-turn tool calling on Codex), tool schemas went to the wire unsanitized, and effort "maximum" was not mapped to xhigh. Related: #2955 (Codex usage telemetry).
Current state
crates/tui/src/client/responses.rs:80-113 — handle_responses_stream builds the request and calls raw builder.json(&body).send() (lines 103-107), then bails on any non-2xx: anyhow::bail!("Responses API error (HTTP {status}): {raw}") — no retry, no Retry-After, no retry_status banner, no mark_request_failure circuit-breaker accounting.
crates/tui/src/client/chat.rs:268 — the chat stream path goes through self.send_with_retry(|| self.http_client.post(&url).json(&body)).
crates/tui/src/client.rs:1019-1083 — send_with_retry wraps with_retry (crates/tui/src/llm_client/mod.rs:1002), honors Retry-After via extract_retry_after (llm_client/mod.rs:1127), sanitizes error bodies with sanitize_http_error_body, drives the crate::retry_status banner, and calls wait_for_rate_limit / mark_request_success / mark_request_failure.
Dispatch: crates/tui/src/client.rs:1141-1142 and 1151-1152 route ApiProvider::OpenaiCodex to handle_responses_message / handle_responses_stream.
Endpoint: CODEX_RESPONSES_PATH = "/codex/responses" (responses.rs:24), appended to self.base_url. RequestPayloadMode (crates/tui/src/config.rs:377-381) has only a ChatCompletions variant today.
In-flight fixes (uncommitted in the local working tree; if you are on a fresh clone where codex_responses_reasoning_effort does not exist in responses.rs, recreate them per the Plan):
convert_messages_to_responses_input (responses.rs:483): a ContentBlock::ToolResult arm in the "user" role branch (responses.rs:504-523) that flushes pending input_text items and emits {"type": "function_call_output", "call_id": ..., "output": ...} via parse_tool_use_id. Previously user-role tool results hit _ => {} and were dropped, so the model never saw tool outputs.
tool_to_responses_function (responses.rs:597-607): clones tool.input_schema and runs schema_sanitize::sanitize_for_responses (crates/tui/src/tools/schema_sanitize.rs:85) before sending parameters.
Unit tests for all three in responses.rs#[cfg(test)] mod tests (responses.rs:656-754).
Plan
Land the in-flight fixes (crates/tui/src/client/responses.rs): if the working tree already contains codex_responses_reasoning_effort, the ToolResult arm in the user-role branch, the sanitize_for_responses call, and the three tests listed above, commit them as-is. Otherwise implement them exactly as described in Current state items 1-4. Requires the schema-sanitize-strict-mode issue's sanitize_for_responses to exist.
Add retry/backoff to handle_responses_stream (responses.rs:80-113): replace the one-shot builder...send() with self.send_with_retry(|| { ... }), building the full request (POST url, Content-Type, Accept: text/event-stream, OpenAI-Beta: responses=experimental, originator: codex_cli_rs, optional chatgpt-account-id from crate::oauth::codex_account_id(), .json(&body)) inside the closure so each retry attempt gets a fresh RequestBuilder. send_with_retry is pub(super) in crates/tui/src/client.rs:1019, so it is callable from responses.rs (same module tree) with no visibility change.
Error mapping parity (responses.rs): keep a defensive non-success check after send_with_retry returns (mirroring chat.rs:271-286), but run the body through sanitize_http_error_body(Some(self.api_provider.display_name()), status.as_u16(), &raw) before bailing so terminal 4xx errors render like the chat path. Import what is needed from super.
Tests — must be in-src unit tests (the #[cfg(test)] module of responses.rs or client.rs): crates/tui is a bin-only crate (crates/tui/Cargo.toml has only a [[bin]] target; mod client is private in crates/tui/src/main.rs:23), so a crates/tui/tests/ integration test CANNOT reach DeepSeekClient or handle_responses_stream. Use wiremock under #[tokio::test] (dev-dep, crates/tui/Cargo.toml:78; in-src wiremock pattern: crates/tui/src/tools/web_search.rs:2365-2375) to mount POST /codex/responses returning 429 with a Retry-After: 0 header once, then a 200 SSE body; assert handle_responses_stream succeeds after the retry, and that a 400 surfaces without retry. Client setup: DeepSeekClient::new (client.rs:582) with a Config whose provider is openai-codex and whose [providers.openai_codex] base_url points at the mock (deepseek_base_url honors the provider override, config.rs:2441-2460; http://127.0.0.1 passes validate_base_url_security, client.rs:358). Note: deepseek_api_key() for this provider calls oauth::get_credentials() (config.rs:2610-2612), which honors a CODEX_ACCESS_TOKEN env override (crates/tui/src/oauth.rs:262) — set a dummy token in the test using a mutex + save/restore env guard like the base-url tests do (client.rs:3583-3614, ALLOW_INSECURE_HTTP_ENV_LOCK pattern).
Stretch (optional, do NOT block landing): add RequestPayloadMode::Responses to crates/tui/src/config.rs:377-381 and make the Responses path usable against standard api.openai.com/v1/responses instead of the hardwired CODEX_RESPONSES_PATH. If attempted, run python3 scripts/check-provider-registry.py. Skip entirely if it grows beyond ~100 lines.
On openai-codex, a multi-turn tool-calling conversation serializes user-role ToolResult blocks as function_call_output input items (covered by responses_input_includes_user_role_tool_results).
Tool parameters sent to /codex/responses are sanitized (oneOf/anyOf/allOf/enum/not stripped at the root) without mutating the original Tool (covered by responses_function_tool_sanitizes_root_composition_schema).
reasoning_effort: "maximum" maps to xhigh; "off"/"none" omit the reasoning block (covered by codex_reasoning_effort_uses_responses_labels).
A 429 (with Retry-After) or 5xx from /codex/responses is retried with backoff and succeeds when the backend recovers, instead of failing the turn on the first response.
Retries surface in the retry_status banner and in logging::warn ("HTTP retry reason=...") exactly like chat-path retries; client failure/success accounting (mark_request_failure/mark_request_success) fires for the Responses path.
Non-retryable errors (400/401/403) fail fast with a sanitized error body that names the provider, matching chat-path formatting.
All new/changed tests pass; no behavior change for non-Codex providers.
Verification
cargo test -p codewhale-tui client::responses
cargo test -p codewhale-tui schema_sanitize
cargo test -p codewhale-tui --test tool_lifecycle_acceptance # guard: existing mock-LLM flows still pass
cargo fmt --all -- --check
cargo clippy -p codewhale-tui --all-targets -- -D warnings
python3 scripts/check-provider-registry.py # only if step 5 (stretch) touched config.rs provider data
Manual (requires Codex OAuth creds in ~/.codex/auth.json via codex login; skip if headless without creds):
cargo run -p codewhale-tui → switch provider to openai-codex.
Ask for a task that forces two sequential tool calls (e.g. "read Cargo.toml, then list crates/"). Both tool results must round-trip — before fix 1, turn 2 stalls or the backend 400s on a function_call without matching output.
Retry path without creds: run the wiremock test from Plan step 4 (it sets a dummy CODEX_ACCESS_TOKEN itself; no real login needed) and confirm the log line HTTP retry reason=rate_limited.
Streaming mid-stream reconnect/resume (retry covers request establishment only, same as the chat path).
The stretch RequestPayloadMode::Responses is explicitly optional; do not let it block the retry work.
Hints
send_with_retry takes FnMut() -> reqwest::RequestBuilder and re-invokes the closure per attempt — build ALL headers inside the closure; do not move a consumed builder in.
The bearer Authorization header is a default header on self.http_client (see comment at responses.rs:87-91) — do not add it again.
After send_with_retry returns Ok, the status is always 2xx (client.rs:1035-1036); the post-call status check is belt-and-braces parity with chat.rs:271, not load-bearing.
Retry classification lives in crates/tui/src/llm_client/mod.rs (with_retry:1002, extract_retry_after:1127, RetryConfig tests at :1155); nothing there should need changes.
Existing tests in responses.rs show the MessageRequest/Tool construction shape — copy those rather than building fixtures from scratch.
Env mutation in tests is process-global: take a static Mutex lock and restore prior values on drop, exactly as client.rs:3583-3614 does for ALLOW_INSECURE_HTTP_ENV — otherwise the new test will flake against parallel tests.
Composite tool-use ids are "{call_id}|{item_id}" (parse_tool_use_id, responses.rs:622); always emit call_id (left half) in function_call_output.
Why
The OpenAI Codex provider (
openai-codex) is the only lane that bypasses CodeWhale's retry stack: a 429 or transient 5xx from the ChatGPT backend kills the turn immediately, while every Chat Completions provider transparently retries with backoff. This blocks #2984 (promote Codex out of preview) and makes Codex-parity benchmark runs flaky. Three correctness fixes for this path already exist as uncommitted work on branchv0.8.58-constitutionand must be anchored: user-roleToolResultblocks were silently dropped (breaking all multi-turn tool calling on Codex), tool schemas went to the wire unsanitized, and effort"maximum"was not mapped toxhigh. Related: #2955 (Codex usage telemetry).Current state
crates/tui/src/client/responses.rs:80-113—handle_responses_streambuilds the request and calls rawbuilder.json(&body).send()(lines 103-107), then bails on any non-2xx:anyhow::bail!("Responses API error (HTTP {status}): {raw}")— no retry, noRetry-After, noretry_statusbanner, nomark_request_failurecircuit-breaker accounting.crates/tui/src/client/chat.rs:268— the chat stream path goes throughself.send_with_retry(|| self.http_client.post(&url).json(&body)).crates/tui/src/client.rs:1019-1083—send_with_retrywrapswith_retry(crates/tui/src/llm_client/mod.rs:1002), honorsRetry-Afterviaextract_retry_after(llm_client/mod.rs:1127), sanitizes error bodies withsanitize_http_error_body, drives thecrate::retry_statusbanner, and callswait_for_rate_limit/mark_request_success/mark_request_failure.crates/tui/src/client.rs:1141-1142and1151-1152routeApiProvider::OpenaiCodextohandle_responses_message/handle_responses_stream.CODEX_RESPONSES_PATH = "/codex/responses"(responses.rs:24), appended toself.base_url.RequestPayloadMode(crates/tui/src/config.rs:377-381) has only aChatCompletionsvariant today.codex_responses_reasoning_effortdoes not exist inresponses.rs, recreate them per the Plan):convert_messages_to_responses_input(responses.rs:483): aContentBlock::ToolResultarm in the"user"role branch (responses.rs:504-523) that flushes pendinginput_textitems and emits{"type": "function_call_output", "call_id": ..., "output": ...}viaparse_tool_use_id. Previously user-role tool results hit_ => {}and were dropped, so the model never saw tool outputs.tool_to_responses_function(responses.rs:597-607): clonestool.input_schemaand runsschema_sanitize::sanitize_for_responses(crates/tui/src/tools/schema_sanitize.rs:85) before sendingparameters.codex_responses_reasoning_effort(responses.rs:609-618): extracted helper mapping"xhigh" | "max" | "maximum"→xhigh,"off"/"none"/...→ omit reasoning, unknown →medium.responses.rs#[cfg(test)] mod tests(responses.rs:656-754).Plan
crates/tui/src/client/responses.rs): if the working tree already containscodex_responses_reasoning_effort, theToolResultarm in the user-role branch, thesanitize_for_responsescall, and the three tests listed above, commit them as-is. Otherwise implement them exactly as described in Current state items 1-4. Requires theschema-sanitize-strict-modeissue'ssanitize_for_responsesto exist.handle_responses_stream(responses.rs:80-113): replace the one-shotbuilder...send()withself.send_with_retry(|| { ... }), building the full request (POSTurl,Content-Type,Accept: text/event-stream,OpenAI-Beta: responses=experimental,originator: codex_cli_rs, optionalchatgpt-account-idfromcrate::oauth::codex_account_id(),.json(&body)) inside the closure so each retry attempt gets a freshRequestBuilder.send_with_retryispub(super)incrates/tui/src/client.rs:1019, so it is callable fromresponses.rs(same module tree) with no visibility change.responses.rs): keep a defensive non-success check aftersend_with_retryreturns (mirroringchat.rs:271-286), but run the body throughsanitize_http_error_body(Some(self.api_provider.display_name()), status.as_u16(), &raw)before bailing so terminal 4xx errors render like the chat path. Import what is needed fromsuper.#[cfg(test)]module ofresponses.rsorclient.rs):crates/tuiis a bin-only crate (crates/tui/Cargo.tomlhas only a[[bin]]target;mod clientis private incrates/tui/src/main.rs:23), so acrates/tui/tests/integration test CANNOT reachDeepSeekClientorhandle_responses_stream. Usewiremockunder#[tokio::test](dev-dep,crates/tui/Cargo.toml:78; in-src wiremock pattern:crates/tui/src/tools/web_search.rs:2365-2375) to mountPOST /codex/responsesreturning 429 with aRetry-After: 0header once, then a 200 SSE body; asserthandle_responses_streamsucceeds after the retry, and that a 400 surfaces without retry. Client setup:DeepSeekClient::new(client.rs:582) with aConfigwhose provider isopenai-codexand whose[providers.openai_codex] base_urlpoints at the mock (deepseek_base_urlhonors the provider override,config.rs:2441-2460;http://127.0.0.1passesvalidate_base_url_security,client.rs:358). Note:deepseek_api_key()for this provider callsoauth::get_credentials()(config.rs:2610-2612), which honors aCODEX_ACCESS_TOKENenv override (crates/tui/src/oauth.rs:262) — set a dummy token in the test using a mutex + save/restore env guard like the base-url tests do (client.rs:3583-3614,ALLOW_INSECURE_HTTP_ENV_LOCKpattern).RequestPayloadMode::Responsestocrates/tui/src/config.rs:377-381and make the Responses path usable against standardapi.openai.com/v1/responsesinstead of the hardwiredCODEX_RESPONSES_PATH. If attempted, runpython3 scripts/check-provider-registry.py. Skip entirely if it grows beyond ~100 lines.Acceptance criteria
openai-codex, a multi-turn tool-calling conversation serializes user-roleToolResultblocks asfunction_call_outputinput items (covered byresponses_input_includes_user_role_tool_results)./codex/responsesare sanitized (oneOf/anyOf/allOf/enum/notstripped at the root) without mutating the originalTool(covered byresponses_function_tool_sanitizes_root_composition_schema).reasoning_effort: "maximum"maps toxhigh;"off"/"none"omit thereasoningblock (covered bycodex_reasoning_effort_uses_responses_labels).Retry-After) or 5xx from/codex/responsesis retried with backoff and succeeds when the backend recovers, instead of failing the turn on the first response.retry_statusbanner and inlogging::warn("HTTP retry reason=...") exactly like chat-path retries; client failure/success accounting (mark_request_failure/mark_request_success) fires for the Responses path.Verification
Manual (requires Codex OAuth creds in
~/.codex/auth.jsonviacodex login; skip if headless without creds):cargo run -p codewhale-tui→ switch provider toopenai-codex.function_callwithout matching output.CODEX_ACCESS_TOKENitself; no real login needed) and confirm the log lineHTTP retry reason=rate_limited.Out of scope
parse_responses_usagestays as-is).RequestPayloadMode::Responsesis explicitly optional; do not let it block the retry work.Hints
send_with_retrytakesFnMut() -> reqwest::RequestBuilderand re-invokes the closure per attempt — build ALL headers inside the closure; do not move a consumed builder in.Authorizationheader is a default header onself.http_client(see comment atresponses.rs:87-91) — do not add it again.send_with_retryreturnsOk, the status is always 2xx (client.rs:1035-1036); the post-call status check is belt-and-braces parity withchat.rs:271, not load-bearing.crates/tui/src/llm_client/mod.rs(with_retry:1002,extract_retry_after:1127,RetryConfigtests at:1155); nothing there should need changes.responses.rsshow theMessageRequest/Toolconstruction shape — copy those rather than building fixtures from scratch.static Mutexlock and restore prior values on drop, exactly asclient.rs:3583-3614does forALLOW_INSECURE_HTTP_ENV— otherwise the new test will flake against parallel tests."{call_id}|{item_id}"(parse_tool_use_id,responses.rs:622); always emitcall_id(left half) infunction_call_output.