examples: add async_openai_completion + point README at it#37
Merged
Conversation
The existing /examples set (basic_usage, error_handling, guard_usage, streaming_usage, with_cycles_usage) all use an inline `call_llm` stub. That demonstrates the runcycles API surface cleanly, but it leaves evaluators without a working starting point for a real LLM call — they have to invent the async-openai wiring themselves, including token extraction from response.usage and cap-to-max_tokens application. This was flagged as the most likely cause of the 13:1 clone-to-install ratio on cycles-client-rust: lots of evaluators clone, look at the examples, and bounce because nothing shows the real-LLM composition. This PR fills that gap: - Adds `examples/async_openai_completion.rs` — a 60-line `with_cycles` example that wires async-openai 0.30.x against runcycles. Reserves `Amount::tokens(1_500)`, applies `caps.max_tokens` from ALLOW_WITH_CAPS, fires the chat completion, extracts `response.usage.total_tokens`, commits the actual. - Adds `async-openai = "0.30"` to dev-dependencies (dev-only — does not affect downstream consumers, but pulls into CI builds). - Updates the README's Quick Start section with a callout pointing at the new example for users who want a real LLM call rather than the `call_llm` placeholder. Verified locally: cargo check --example async_openai_completion --all-features ✓ cargo build --example async_openai_completion --all-features ✓ cargo clippy --example async_openai_completion --all-features -- -D warnings ✓ cargo fmt --check ✓ Companion docs PR in cycles-docs (PR #659) covers the same composition in detail (streaming via ReservationGuard, error-aware patterns, token-to-microcents conversion, gotchas). Out of scope (not in this PR): - `examples/axum_middleware.rs` — would close the parallel gap for the Rust web-framework audience. Axum middleware needs the tower::Service trait surface, which is enough complexity that it warrants its own PR with its own iteration cycle. - A streaming `async_openai_streaming.rs` example — the streaming pattern uses ReservationGuard (not with_cycles, since token totals arrive only after the stream ends). Worth a separate example for the same reason.
Codex flagged design/idiom issues that cargo check / clippy / fmt
can't see — exactly the value of layered review on top of compile
checks. All compile-pass, all human-readable now.
Apply/skip tally: 8 applied, 0 pushed back.
Applied:
- **Silent under-billing fixed.** The original `response.usage.map(...).unwrap_or(0)`
would commit `Amount::tokens(0)` when a provider omitted `usage` —
silent under-billing on a successful-looking response, which is
exactly wrong for a teaching example about budget governance. Now
`response.usage.ok_or(...)?` so the closure errors and `with_cycles`
auto-releases the reservation. Production code that needs a fallback
must opt into one explicitly.
- **Empty/missing content now errors.** `response.choices.first().and_then(|c| c.message.content.clone()).unwrap_or_default()`
silently returned an empty string. Now uses `.ok_or(...)?` so a
response with no choices or no content fails loud and the reservation
releases.
- **Zero/negative cap now errors.** `cap.max(0)` would have sent
`max_tokens=0` to OpenAI (which OpenAI rejects, after we've already
spent the request budget). Now `u32::try_from(cap)` errors on
negatives, and a zero cap errors explicitly. Both before any
network call to OpenAI.
- **`.max_tokens()` → `.max_completion_tokens()`.** OpenAI deprecated
`max_tokens` for chat completions in favor of `max_completion_tokens`.
async-openai 0.30.1 supports both; the example uses the current name.
- **`u.total_tokens as i64` → `i64::from(u.total_tokens)`.** More
idiomatic, no `as` cast.
- **Module doc comment accuracy:**
- Removed the inaccurate "CYCLES_BASE_URL env var override" claim —
the code hardcodes the URL.
- Added a "Loud-failure stance" section explicitly explaining the
new error-on-edge-cases design.
- Softened the absolute "streaming uses ReservationGuard instead of
with_cycles" framing — the guard is the right primitive when
chunks are forwarded while the reservation remains open.
- Removed editorial "the example most users actually want" tone.
- **README callout under-stated requirements.** Now explicitly
mentions: reachable Cycles server, tenant API key, and a TOKENS-
denominated budget at the scope being reserved against — not just
`OPENAI_API_KEY`.
- **README claim softened.** Dropped "so the budget reflects actual
spend" framing (which was directionally true only when usage is
present) to "threading the response's usage.total_tokens back into
the commit" — accurate regardless of failure path.
Verified locally:
cargo check --example async_openai_completion --all-features ✓
cargo clippy --example async_openai_completion --all-features -- -D warnings ✓
cargo fmt --check ✓
CI's `cargo audit --deny warnings` step failed on this PR because the
0.30.x line of async-openai pulled in two unmaintained transitive deps
that the org-wide audit job treats as errors:
- backoff 0.4.0 (RUSTSEC-2025-0012, unmaintained)
- instant 0.1.13 (RUSTSEC-2024-0384, unmaintained, via backoff)
async-openai 0.31+ replaced its retry stack with `tower`, dropping the
backoff dependency entirely. Bumping to 0.38 (current latest) clears
both advisories without needing audit-ignore configuration.
API changes between 0.30 and 0.38 are minor for the example's usage:
- `Client` is now gated behind the per-API features (the crate split
its surface in 0.31). Enabled `chat-completion` for the example.
- Chat-completion types moved from `async_openai::types::` to
`async_openai::types::chat::`. Updated the import.
- `default-features = false, features = ["chat-completion", "rustls"]`
keeps the dev-dep set minimal — no unused-feature surface, and
rustls matches the rest of the runcycles crate's TLS story.
The example still uses the same `with_cycles` flow, the same
`response.usage.total_tokens` extraction, and the same loud-failure
patterns from the codex-round-1 fixes. Only the imports and Cargo.toml
changed.
Verified locally:
cargo check --example async_openai_completion --all-features ✓
cargo clippy --example async_openai_completion --all-features -- -D warnings ✓
cargo fmt --check ✓
cargo tree -i backoff → "did not match any packages" ✓
cargo tree -i instant → "did not match any packages" ✓
Note: cycles-docs PR #659 currently pins async-openai 0.30.x in its
how-to doc; that doc needs a parallel bump to 0.38 + types::chat:: in
a separate commit on that PR.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The existing five examples (`basic_usage`, `error_handling`, `guard_usage`, `streaming_usage`, `with_cycles_usage`) all use an inline `call_llm` stub. That keeps the runcycles API surface clean to read, but it leaves evaluators without a working starting point for a real LLM call. They have to invent the async-openai wiring themselves — token extraction from `response.usage`, `caps.max_tokens` application, etc.
Background analysis on the docs side flagged this as the most likely cause of the 13:1 clone-to-install ratio on this repo (907 GitHub clones, 67 crates.io installs as of 2026-05-16, vs ~1.2:1 for the TypeScript client). The hypothesis is that lots of evaluators clone, look at `examples/`, see only stub-based code, and bounce before `cargo add runcycles`.
This PR closes that gap on the repo side. (Docs side: cycles-docs PR #659, which lands today, adds `/configuration/rust-client-configuration-reference.md` + `/how-to/integrating-cycles-with-async-openai.md` covering streaming, error-aware ReservationGuard patterns, and token-to-microcents conversion.)
What's in this PR
New example:
Cargo.toml:
README.md:
Verified locally
```
cargo check --example async_openai_completion --all-features ✓
cargo build --example async_openai_completion --all-features ✓
cargo clippy --example async_openai_completion --all-features -- -D warnings ✓
cargo fmt --check ✓
```
CI will re-run these on the standard `runcycles/.github/.github/workflows/ci-rust.yml@v1` matrix (stable + 1.88 with `--all-features`).
Out of scope
Two follow-up examples that would close the parallel gaps but each deserves its own PR + iteration:
`examples/axum_middleware.rs` — Axum web-framework integration. Axum middleware uses `tower::Service` and the version surface there is intricate enough that I want to iterate it separately.
`examples/async_openai_streaming.rs` — Streaming chat completions. The streaming flow uses `ReservationGuard` rather than `with_cycles` (token totals arrive only after the stream ends), plus the `stream_options.include_usage = true` gotcha. Worth a separate file so the with-cycles example stays minimal.
Test plan