docs(rust): sync async-openai how-to with the loud-failure example#661
Merged
Merged
Conversation
When the cycles-client-rust example (PR #37) went through codex code review, codex caught silent-under-billing bugs in patterns like `response.usage.map(...).unwrap_or(0)` — we fixed those in the example but never backported the fixes to the cycles-docs how-to. PR #659 merged the doc with the buggy patterns intact. As a result, the shipped doc (on main) and the shipped example (in cycles-client-rust) were teaching opposite things: - Doc: `.unwrap_or(0)` on missing usage → silent commit zero - Example: `.ok_or(...)?` on missing usage → error, auto-release Plus an internal contradiction inside the doc itself: the "Common gotchas" section said "Treat None as estimate locally rather than no spend" while the code blocks above unwrap to zero. Description vs code disagreed. This PR brings the doc back in sync with the example and fixes the internal contradictions. Changes in `how-to/integrating-cycles-with-async-openai.md`: - `.max_tokens(...)` → `.max_completion_tokens(...)` in all four code blocks (basic, ALLOW_WITH_CAPS, streaming, error-aware). OpenAI deprecated `max_tokens` for chat completions in favor of `max_completion_tokens`; async-openai 0.38 supports both but the doc should teach the current name. - Missing `response.usage`: `.unwrap_or(0)` → `.ok_or(...)?` in non-streaming examples (basic, ALLOW_WITH_CAPS, error-aware, token-to-USD). Streaming kept the fallback pattern because re-issuing a consumed stream is expensive — that's the one legitimate place a tokenizer estimate beats erroring out. - Missing `response.choices[0].message.content`: `.unwrap_or_default()` → `.ok_or(...)?` everywhere. Committing on an empty reply is the silent under-billing pattern that codex caught in the example. - `(cap as u32)` → `u32::try_from(cap)?` with explicit zero-check. The old pattern would silently wrap on a negative `caps.max_tokens` and would send `max_completion_tokens=0` (rejected by OpenAI, but only after we've already paid the request cost) on a zero cap. - `usage.total_tokens as i64` → `i64::from(usage.total_tokens)` — no `as` cast risk, more idiomatic. - `microcents as i64` → `i64::try_from(microcents)?` — same. - Streaming example's `estimate_tokens_with_tiktoken(...)` was a fictional function reference (never defined or imported). Replaced with a real inline tokenizer stub showing how to plug in `tiktoken-rs`, with a clear comment that the alternative is releasing the guard and erroring. - Added a "Loud-failure stance" callout under "What you get" that spells out the design choice explicitly and points readers at the shipped `examples/async_openai_completion.rs` for the reference implementation. - Rewrote gotchas #2 and #3 to match the new code rather than contradict it: gotcha #2 now distinguishes non-streaming (loud failure) from streaming (tokenizer fallback); gotcha #3 says "fail loud and release" instead of "commit zero". Added gotcha #6 covering `as` casts on values from external sources. The async-openai 0.38 API and the runcycles 0.2.4 API are unchanged from the prior version-sync PR (#660); this PR only changes how the example code handles edge cases.
Apply/skip tally: 6 applied, 0 pushed back.
Applied:
- `ChatCompletionStreamOptions` fields are `Option<bool>` in async-openai
0.38.x, not raw `bool`. Updated to
`{ include_usage: Some(true), include_obfuscation: None }` with a
comment explaining the `None` default.
- Cargo.toml block was missing `thiserror = "2"` even though the
error-aware example uses `#[derive(thiserror::Error)]`. Added.
- Streaming zero-cap path used `guard.release(...).await?` — if release
itself errored, the caller would see the release error instead of
the original zero-cap error. Switched to
`let _ = guard.release(...).await;` so the typed zero-cap error wins.
- Streaming end-of-stream had an "empty content, but has usage" hole —
the example would commit on a stream that produced no text. Added an
empty-`full_text` release-and-bail check before the usage check, so
the loud-failure stance applies consistently.
- "Other Rust LLM clients" section still showed the old
`response.usage.map(|u| u.total_tokens as i64)` pattern in the
adaptation guidance for non-OpenAI providers. Updated to the
`ok_or(...)?` + `i64::from(...)` shape that matches the rest of the
doc.
- The `estimate_tokens()` function was referenced in the streaming
example but only existed as a comment. Made it a real, copy-pasteable
helper (uses `tiktoken-rs::o200k_base`) and moved the fallback into
an explicit "Optional: tokenizer fallback for missing-usage chunks"
subsection, framed as an alternative to the loud-failure default.
Also caught during the same pass:
- The tiktoken-rs stub itself used `as i64` casts on `usize` values —
exactly the pattern the new gotcha #6 warns against. Switched to
`i64::try_from(...)?` and changed the function signature to return
`Result<i64, Box<dyn Error>>` so the conversion error has somewhere
to go.
Codex verified the runcycles + async-openai 0.38.2 API surface, the
tiktoken-rs API names, and confirmed the shipped
cycles-client-rust/examples/async_openai_completion.rs on main matches
the doc's loud-failure stance.
4 tasks
amavashev
added a commit
that referenced
this pull request
May 17, 2026
Two findings from the fresh codex pass against main after PR #661 merged plus one follow-up after round 2: - `tiktoken-rs` version pin in the streaming tokenizer-fallback snippet was `"0.6"` while crates.io is at 0.11.0. The cited API (`o200k_base`, `encode_with_special_tokens`) is unchanged across that range, but the pin should match current. Bumped to `"0.11"`. - The error-aware `run_completion` example skipped cap handling entirely — it called `.max_completion_tokens(800u32)` without consulting `guard.caps()`. The intro's loud-failure stance claimed all examples on the page error out on non-positive `caps.max_tokens`, which was false for that example. Added the same `u32::try_from(cap)` + zero-check + release-and-error block used in the ALLOW_WITH_CAPS and streaming examples, with the error variants typed as `CompletionError::Cycles(CyclesError::Validation(...))` to flow through the function's typed error contract. - Codex round 2 then caught that the BASIC example also doesn't read caps — it takes `_ctx` to keep the minimum-viable composition compact. Rather than add cap handling and dilute that example's purpose, narrowed the intro claim: all four examples fail loud on missing `usage` and missing `content`, and the three cap-reading examples additionally fail on non-positive caps. The basic example explicitly ignores caps; production code should follow the capped pattern. Codex round 3 returned SHIP. Sources verified: runcycles 0.2.4 source, the shipped async_openai_completion.rs example on main, async-openai 0.38.2 docs.rs, tiktoken-rs 0.11.0 docs.rs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
When the cycles-client-rust example (PR #37, merged) went through codex code review, codex caught silent-under-billing bugs in patterns like `response.usage.map(...).unwrap_or(0)`. Those fixes landed in the example. They didn't land in the cycles-docs how-to — PR #659 merged the doc with the original buggy patterns intact.
The result: the shipped doc on `main` and the shipped example in cycles-client-rust were teaching opposite things on the same edge cases.
Plus an internal contradiction in the doc itself: gotcha #2 told readers "Treat `None` as estimate locally rather than no spend" while the code blocks unwrapped to zero — which is no spend. Description and code disagreed.
What's in this PR
Brings the doc back in sync with the example and fixes the internal contradictions.
Code changes across all four code blocks (basic, ALLOW_WITH_CAPS, streaming, error-aware)
Tokenizer fallback (streaming only)
The fictional `estimate_tokens_with_tiktoken(...)` call became a real, copy-pasteable helper in a new "Optional: tokenizer fallback for missing-usage chunks" subsection. Uses `tiktoken_rs::o200k_base()` with `i64::try_from(...)` conversions (matches gotcha #6).
Prose changes
Review process
Verified against runcycles 0.2.4 source, async-openai 0.38.2 docs.rs, tiktoken-rs latest docs.rs, and the shipped `cycles-client-rust/examples/async_openai_completion.rs` on main.
Test plan