Skip to content

examples: add async_openai_completion + point README at it#37

Merged
amavashev merged 3 commits into
mainfrom
examples/async-openai-completion
May 16, 2026
Merged

examples: add async_openai_completion + point README at it#37
amavashev merged 3 commits into
mainfrom
examples/async-openai-completion

Conversation

@amavashev
Copy link
Copy Markdown
Contributor

@amavashev amavashev commented May 16, 2026

Why

The existing five examples (`basic_usage`, `error_handling`, `guard_usage`, `streaming_usage`, `with_cycles_usage`) all use an inline `call_llm` stub. That keeps the runcycles API surface clean to read, but it leaves evaluators without a working starting point for a real LLM call. They have to invent the async-openai wiring themselves — token extraction from `response.usage`, `caps.max_tokens` application, etc.

Background analysis on the docs side flagged this as the most likely cause of the 13:1 clone-to-install ratio on this repo (907 GitHub clones, 67 crates.io installs as of 2026-05-16, vs ~1.2:1 for the TypeScript client). The hypothesis is that lots of evaluators clone, look at `examples/`, see only stub-based code, and bounce before `cargo add runcycles`.

This PR closes that gap on the repo side. (Docs side: cycles-docs PR #659, which lands today, adds `/configuration/rust-client-configuration-reference.md` + `/how-to/integrating-cycles-with-async-openai.md` covering streaming, error-aware ReservationGuard patterns, and token-to-microcents conversion.)

What's in this PR

New example:

  • `examples/async_openai_completion.rs` — 60-line `with_cycles` example. Reserves `Amount::tokens(1_500)`, narrows `max_tokens` from `caps.max_tokens` when ALLOW_WITH_CAPS, fires `openai.chat().create()`, extracts `response.usage.total_tokens`, commits the actual.

Cargo.toml:

  • Adds `async-openai = "0.30"` to `[dev-dependencies]`. Dev-only — does not affect downstream consumers' dependency graph. Pulls into CI builds.

README.md:

  • Adds a callout in the Quick Start section pointing at the new example. Keeps the existing minimal `call_llm("Hello")` snippet (good for skimming the API surface) but signposts the real-LLM example directly below.

Verified locally

```
cargo check --example async_openai_completion --all-features ✓
cargo build --example async_openai_completion --all-features ✓
cargo clippy --example async_openai_completion --all-features -- -D warnings ✓
cargo fmt --check ✓
```

CI will re-run these on the standard `runcycles/.github/.github/workflows/ci-rust.yml@v1` matrix (stable + 1.88 with `--all-features`).

Out of scope

Two follow-up examples that would close the parallel gaps but each deserves its own PR + iteration:

  1. `examples/axum_middleware.rs` — Axum web-framework integration. Axum middleware uses `tower::Service` and the version surface there is intricate enough that I want to iterate it separately.

  2. `examples/async_openai_streaming.rs` — Streaming chat completions. The streaming flow uses `ReservationGuard` rather than `with_cycles` (token totals arrive only after the stream ends), plus the `stream_options.include_usage = true` gotcha. Worth a separate file so the with-cycles example stays minimal.

Test plan

  • CI green
  • `cargo run --example async_openai_completion` works end-to-end with a real `OPENAI_API_KEY` and a running Cycles server
  • README rendering shows the new callout in a sensible position

amavashev added 3 commits May 16, 2026 11:32
The existing /examples set (basic_usage, error_handling, guard_usage,
streaming_usage, with_cycles_usage) all use an inline `call_llm` stub.
That demonstrates the runcycles API surface cleanly, but it leaves
evaluators without a working starting point for a real LLM call —
they have to invent the async-openai wiring themselves, including
token extraction from response.usage and cap-to-max_tokens
application.

This was flagged as the most likely cause of the 13:1 clone-to-install
ratio on cycles-client-rust: lots of evaluators clone, look at the
examples, and bounce because nothing shows the real-LLM composition.

This PR fills that gap:

- Adds `examples/async_openai_completion.rs` — a 60-line `with_cycles`
  example that wires async-openai 0.30.x against runcycles. Reserves
  `Amount::tokens(1_500)`, applies `caps.max_tokens` from ALLOW_WITH_CAPS,
  fires the chat completion, extracts `response.usage.total_tokens`,
  commits the actual.
- Adds `async-openai = "0.30"` to dev-dependencies (dev-only — does
  not affect downstream consumers, but pulls into CI builds).
- Updates the README's Quick Start section with a callout pointing at
  the new example for users who want a real LLM call rather than the
  `call_llm` placeholder.

Verified locally:
  cargo check --example async_openai_completion --all-features ✓
  cargo build --example async_openai_completion --all-features ✓
  cargo clippy --example async_openai_completion --all-features -- -D warnings ✓
  cargo fmt --check ✓

Companion docs PR in cycles-docs (PR #659) covers the same composition
in detail (streaming via ReservationGuard, error-aware patterns,
token-to-microcents conversion, gotchas).

Out of scope (not in this PR):

- `examples/axum_middleware.rs` — would close the parallel gap for the
  Rust web-framework audience. Axum middleware needs the tower::Service
  trait surface, which is enough complexity that it warrants its own
  PR with its own iteration cycle.
- A streaming `async_openai_streaming.rs` example — the streaming
  pattern uses ReservationGuard (not with_cycles, since token totals
  arrive only after the stream ends). Worth a separate example for
  the same reason.
Codex flagged design/idiom issues that cargo check / clippy / fmt
can't see — exactly the value of layered review on top of compile
checks. All compile-pass, all human-readable now.

Apply/skip tally: 8 applied, 0 pushed back.

Applied:

- **Silent under-billing fixed.** The original `response.usage.map(...).unwrap_or(0)`
  would commit `Amount::tokens(0)` when a provider omitted `usage` —
  silent under-billing on a successful-looking response, which is
  exactly wrong for a teaching example about budget governance. Now
  `response.usage.ok_or(...)?` so the closure errors and `with_cycles`
  auto-releases the reservation. Production code that needs a fallback
  must opt into one explicitly.

- **Empty/missing content now errors.** `response.choices.first().and_then(|c| c.message.content.clone()).unwrap_or_default()`
  silently returned an empty string. Now uses `.ok_or(...)?` so a
  response with no choices or no content fails loud and the reservation
  releases.

- **Zero/negative cap now errors.** `cap.max(0)` would have sent
  `max_tokens=0` to OpenAI (which OpenAI rejects, after we've already
  spent the request budget). Now `u32::try_from(cap)` errors on
  negatives, and a zero cap errors explicitly. Both before any
  network call to OpenAI.

- **`.max_tokens()` → `.max_completion_tokens()`.** OpenAI deprecated
  `max_tokens` for chat completions in favor of `max_completion_tokens`.
  async-openai 0.30.1 supports both; the example uses the current name.

- **`u.total_tokens as i64` → `i64::from(u.total_tokens)`.** More
  idiomatic, no `as` cast.

- **Module doc comment accuracy:**
  - Removed the inaccurate "CYCLES_BASE_URL env var override" claim —
    the code hardcodes the URL.
  - Added a "Loud-failure stance" section explicitly explaining the
    new error-on-edge-cases design.
  - Softened the absolute "streaming uses ReservationGuard instead of
    with_cycles" framing — the guard is the right primitive when
    chunks are forwarded while the reservation remains open.
  - Removed editorial "the example most users actually want" tone.

- **README callout under-stated requirements.** Now explicitly
  mentions: reachable Cycles server, tenant API key, and a TOKENS-
  denominated budget at the scope being reserved against — not just
  `OPENAI_API_KEY`.

- **README claim softened.** Dropped "so the budget reflects actual
  spend" framing (which was directionally true only when usage is
  present) to "threading the response's usage.total_tokens back into
  the commit" — accurate regardless of failure path.

Verified locally:
  cargo check  --example async_openai_completion --all-features  ✓
  cargo clippy --example async_openai_completion --all-features -- -D warnings  ✓
  cargo fmt --check  ✓
CI's `cargo audit --deny warnings` step failed on this PR because the
0.30.x line of async-openai pulled in two unmaintained transitive deps
that the org-wide audit job treats as errors:

  - backoff 0.4.0 (RUSTSEC-2025-0012, unmaintained)
  - instant 0.1.13 (RUSTSEC-2024-0384, unmaintained, via backoff)

async-openai 0.31+ replaced its retry stack with `tower`, dropping the
backoff dependency entirely. Bumping to 0.38 (current latest) clears
both advisories without needing audit-ignore configuration.

API changes between 0.30 and 0.38 are minor for the example's usage:

  - `Client` is now gated behind the per-API features (the crate split
    its surface in 0.31). Enabled `chat-completion` for the example.
  - Chat-completion types moved from `async_openai::types::` to
    `async_openai::types::chat::`. Updated the import.
  - `default-features = false, features = ["chat-completion", "rustls"]`
    keeps the dev-dep set minimal — no unused-feature surface, and
    rustls matches the rest of the runcycles crate's TLS story.

The example still uses the same `with_cycles` flow, the same
`response.usage.total_tokens` extraction, and the same loud-failure
patterns from the codex-round-1 fixes. Only the imports and Cargo.toml
changed.

Verified locally:
  cargo check  --example async_openai_completion --all-features  ✓
  cargo clippy --example async_openai_completion --all-features -- -D warnings  ✓
  cargo fmt --check  ✓
  cargo tree -i backoff  → "did not match any packages"  ✓
  cargo tree -i instant  → "did not match any packages"  ✓

Note: cycles-docs PR #659 currently pins async-openai 0.30.x in its
how-to doc; that doc needs a parallel bump to 0.38 + types::chat:: in
a separate commit on that PR.
@amavashev amavashev merged commit ca15f8e into main May 16, 2026
8 checks passed
@amavashev amavashev deleted the examples/async-openai-completion branch May 16, 2026 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant