Skip to content

docs(rust): sync async-openai how-to with the loud-failure example#661

Merged
amavashev merged 2 commits into
mainfrom
docs/rust-async-openai-loud-failure-consistency
May 16, 2026
Merged

docs(rust): sync async-openai how-to with the loud-failure example#661
amavashev merged 2 commits into
mainfrom
docs/rust-async-openai-loud-failure-consistency

Conversation

@amavashev
Copy link
Copy Markdown
Contributor

Why

When the cycles-client-rust example (PR #37, merged) went through codex code review, codex caught silent-under-billing bugs in patterns like `response.usage.map(...).unwrap_or(0)`. Those fixes landed in the example. They didn't land in the cycles-docs how-to — PR #659 merged the doc with the original buggy patterns intact.

The result: the shipped doc on `main` and the shipped example in cycles-client-rust were teaching opposite things on the same edge cases.

Edge case Doc (on main, pre-PR) Example (on main, post-PR-#37)
Missing `response.usage` `.unwrap_or(0)` → commit zero `.ok_or(...)?` → release + error
Missing `message.content` `.unwrap_or_default()` → empty string success `.ok_or(...)?` → release + error
Non-positive `caps.max_tokens` `(cap as u32)` wraps; `max_completion_tokens=0` sent `u32::try_from(...)?` + zero-check + error
OpenAI field `.max_tokens(...)` (deprecated upstream) `.max_completion_tokens(...)`
Token conversion `as i64` `i64::from(...)` / `i64::try_from(...)`

Plus an internal contradiction in the doc itself: gotcha #2 told readers "Treat `None` as estimate locally rather than no spend" while the code blocks unwrapped to zero — which is no spend. Description and code disagreed.

What's in this PR

Brings the doc back in sync with the example and fixes the internal contradictions.

Code changes across all four code blocks (basic, ALLOW_WITH_CAPS, streaming, error-aware)

  • `.max_tokens(...)` → `.max_completion_tokens(...)`
  • Missing `response.usage`: `.unwrap_or(0)` → `.ok_or(...)?` in non-streaming examples. Streaming keeps the option of a tokenizer fallback (legitimate exception — re-issuing a consumed stream is expensive), but the default code path now releases + errors; the tokenizer fallback is shown in a dedicated subsection.
  • Missing `message.content`: `.unwrap_or_default()` → `.ok_or(...)?` everywhere
  • `(cap as u32)` → `u32::try_from(cap)?` with explicit zero-check
  • `usage.total_tokens as i64` → `i64::from(usage.total_tokens)`
  • `microcents as i64` → `i64::try_from(microcents)?`
  • `ChatCompletionStreamOptions`: fixed to use `Option` field shape that async-openai 0.38.x actually defines (`include_usage: Some(true), include_obfuscation: None`)
  • Streaming end-of-stream: added an empty-`full_text` release-and-bail check before the usage check, so the loud-failure stance applies to the "no content but has usage" edge case too
  • Streaming zero-cap: `guard.release(...).await?` → `let _ = guard.release(...).await;` so the typed zero-cap error wins over a release-side error

Tokenizer fallback (streaming only)

The fictional `estimate_tokens_with_tiktoken(...)` call became a real, copy-pasteable helper in a new "Optional: tokenizer fallback for missing-usage chunks" subsection. Uses `tiktoken_rs::o200k_base()` with `i64::try_from(...)` conversions (matches gotcha #6).

Prose changes

Review process

  • Codex round 1 → REVISE-MINOR (6 findings, all applied):
    • `ChatCompletionStreamOptions` field shape against actual async-openai 0.38.2 surface
    • Missing `thiserror` in Cargo.toml block
    • Streaming `release(...).await?` vs `let _ = ... .await` swallow order
    • End-of-stream empty-content hole
    • Stale `response.usage.map(...).unwrap_or(0)` prose still in "Other Rust LLM clients"
    • Fictional `estimate_tokens(...)` call — made real
    • Also caught: tiktoken stub itself used `as i64` casts (the same pattern gotcha Add comprehensive comparison guide for Cycles vs alternatives #6 warns against); switched to `i64::try_from(...)`
  • Codex round 2 → SHIP

Verified against runcycles 0.2.4 source, async-openai 0.38.2 docs.rs, tiktoken-rs latest docs.rs, and the shipped `cycles-client-rust/examples/async_openai_completion.rs` on main.

Test plan

amavashev added 2 commits May 16, 2026 12:41
When the cycles-client-rust example (PR #37) went through codex code
review, codex caught silent-under-billing bugs in patterns like
`response.usage.map(...).unwrap_or(0)` — we fixed those in the example
but never backported the fixes to the cycles-docs how-to. PR #659
merged the doc with the buggy patterns intact. As a result, the
shipped doc (on main) and the shipped example (in cycles-client-rust)
were teaching opposite things:

  - Doc:     `.unwrap_or(0)` on missing usage → silent commit zero
  - Example: `.ok_or(...)?` on missing usage → error, auto-release

Plus an internal contradiction inside the doc itself: the "Common
gotchas" section said "Treat None as estimate locally rather than no
spend" while the code blocks above unwrap to zero. Description vs
code disagreed.

This PR brings the doc back in sync with the example and fixes the
internal contradictions.

Changes in `how-to/integrating-cycles-with-async-openai.md`:

- `.max_tokens(...)` → `.max_completion_tokens(...)` in all four code
  blocks (basic, ALLOW_WITH_CAPS, streaming, error-aware). OpenAI
  deprecated `max_tokens` for chat completions in favor of
  `max_completion_tokens`; async-openai 0.38 supports both but the
  doc should teach the current name.

- Missing `response.usage`: `.unwrap_or(0)` → `.ok_or(...)?` in
  non-streaming examples (basic, ALLOW_WITH_CAPS, error-aware,
  token-to-USD). Streaming kept the fallback pattern because re-issuing
  a consumed stream is expensive — that's the one legitimate place a
  tokenizer estimate beats erroring out.

- Missing `response.choices[0].message.content`: `.unwrap_or_default()`
  → `.ok_or(...)?` everywhere. Committing on an empty reply is the
  silent under-billing pattern that codex caught in the example.

- `(cap as u32)` → `u32::try_from(cap)?` with explicit zero-check.
  The old pattern would silently wrap on a negative `caps.max_tokens`
  and would send `max_completion_tokens=0` (rejected by OpenAI, but
  only after we've already paid the request cost) on a zero cap.

- `usage.total_tokens as i64` → `i64::from(usage.total_tokens)` — no
  `as` cast risk, more idiomatic.

- `microcents as i64` → `i64::try_from(microcents)?` — same.

- Streaming example's `estimate_tokens_with_tiktoken(...)` was a
  fictional function reference (never defined or imported). Replaced
  with a real inline tokenizer stub showing how to plug in `tiktoken-rs`,
  with a clear comment that the alternative is releasing the guard
  and erroring.

- Added a "Loud-failure stance" callout under "What you get" that
  spells out the design choice explicitly and points readers at the
  shipped `examples/async_openai_completion.rs` for the reference
  implementation.

- Rewrote gotchas #2 and #3 to match the new code rather than
  contradict it: gotcha #2 now distinguishes non-streaming (loud
  failure) from streaming (tokenizer fallback); gotcha #3 says "fail
  loud and release" instead of "commit zero". Added gotcha #6 covering
  `as` casts on values from external sources.

The async-openai 0.38 API and the runcycles 0.2.4 API are unchanged
from the prior version-sync PR (#660); this PR only changes how the
example code handles edge cases.
Apply/skip tally: 6 applied, 0 pushed back.

Applied:

- `ChatCompletionStreamOptions` fields are `Option<bool>` in async-openai
  0.38.x, not raw `bool`. Updated to
  `{ include_usage: Some(true), include_obfuscation: None }` with a
  comment explaining the `None` default.

- Cargo.toml block was missing `thiserror = "2"` even though the
  error-aware example uses `#[derive(thiserror::Error)]`. Added.

- Streaming zero-cap path used `guard.release(...).await?` — if release
  itself errored, the caller would see the release error instead of
  the original zero-cap error. Switched to
  `let _ = guard.release(...).await;` so the typed zero-cap error wins.

- Streaming end-of-stream had an "empty content, but has usage" hole —
  the example would commit on a stream that produced no text. Added an
  empty-`full_text` release-and-bail check before the usage check, so
  the loud-failure stance applies consistently.

- "Other Rust LLM clients" section still showed the old
  `response.usage.map(|u| u.total_tokens as i64)` pattern in the
  adaptation guidance for non-OpenAI providers. Updated to the
  `ok_or(...)?` + `i64::from(...)` shape that matches the rest of the
  doc.

- The `estimate_tokens()` function was referenced in the streaming
  example but only existed as a comment. Made it a real, copy-pasteable
  helper (uses `tiktoken-rs::o200k_base`) and moved the fallback into
  an explicit "Optional: tokenizer fallback for missing-usage chunks"
  subsection, framed as an alternative to the loud-failure default.

Also caught during the same pass:

- The tiktoken-rs stub itself used `as i64` casts on `usize` values —
  exactly the pattern the new gotcha #6 warns against. Switched to
  `i64::try_from(...)?` and changed the function signature to return
  `Result<i64, Box<dyn Error>>` so the conversion error has somewhere
  to go.

Codex verified the runcycles + async-openai 0.38.2 API surface, the
tiktoken-rs API names, and confirmed the shipped
cycles-client-rust/examples/async_openai_completion.rs on main matches
the doc's loud-failure stance.
@amavashev amavashev merged commit 779c022 into main May 16, 2026
5 checks passed
@amavashev amavashev deleted the docs/rust-async-openai-loud-failure-consistency branch May 16, 2026 23:49
@amavashev amavashev restored the docs/rust-async-openai-loud-failure-consistency branch May 16, 2026 23:55
amavashev added a commit that referenced this pull request May 17, 2026
Two findings from the fresh codex pass against main after PR #661 merged
plus one follow-up after round 2:

- `tiktoken-rs` version pin in the streaming tokenizer-fallback snippet
  was `"0.6"` while crates.io is at 0.11.0. The cited API (`o200k_base`,
  `encode_with_special_tokens`) is unchanged across that range, but
  the pin should match current. Bumped to `"0.11"`.

- The error-aware `run_completion` example skipped cap handling
  entirely — it called `.max_completion_tokens(800u32)` without
  consulting `guard.caps()`. The intro's loud-failure stance claimed
  all examples on the page error out on non-positive `caps.max_tokens`,
  which was false for that example. Added the same `u32::try_from(cap)`
  + zero-check + release-and-error block used in the ALLOW_WITH_CAPS
  and streaming examples, with the error variants typed as
  `CompletionError::Cycles(CyclesError::Validation(...))` to flow
  through the function's typed error contract.

- Codex round 2 then caught that the BASIC example also doesn't read
  caps — it takes `_ctx` to keep the minimum-viable composition
  compact. Rather than add cap handling and dilute that example's
  purpose, narrowed the intro claim: all four examples fail loud on
  missing `usage` and missing `content`, and the three cap-reading
  examples additionally fail on non-positive caps. The basic example
  explicitly ignores caps; production code should follow the capped
  pattern.

Codex round 3 returned SHIP. Sources verified: runcycles 0.2.4 source,
the shipped async_openai_completion.rs example on main, async-openai
0.38.2 docs.rs, tiktoken-rs 0.11.0 docs.rs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant