Skip to content

feat(provider): support OpenAI reasoning models (o1/o3/gpt-5) on the compat path#30

Merged
AccursedGalaxy merged 2 commits into
mainfrom
issue-29-reasoning-models
May 22, 2026
Merged

feat(provider): support OpenAI reasoning models (o1/o3/gpt-5) on the compat path#30
AccursedGalaxy merged 2 commits into
mainfrom
issue-29-reasoning-models

Conversation

@AccursedGalaxy
Copy link
Copy Markdown
Owner

Closes #29.

Problem

The shared OpenAI-compat transport sent max_tokens and temperature unconditionally. OpenAI's reasoning models (o-series, gpt-5) reject both — they require max_completion_tokens and accept only the default temperature. A user who configured one on provider: openai got a raw SDK BadRequestError, not graceful handling.

Design

Decision-tree was grilled before any code; the load-bearing call is the predicate is an optimization, the retry is the correctness guarantee.

  • _build_completion_kwargs — the one place the parameter split lives. Reasoning → max_completion_tokens, omit temperature; everything else → unchanged max_tokens + temperature.
  • _is_reasoning_model — an OpenAI-only name hint (^o\d / gpt-5, excluding gpt-5-chat-latest). It only avoids a wasted round-trip; it is deliberately narrow and never the source of truth.
  • _create_with_param_fallback — one-shot retry on the structured 400 (code == "unsupported_parameter"). Keys off the error, not the provider, so it self-corrects for any compat backend (Groq/Gemini/DeepSeek) and survives OpenAI's naming churn. BadRequestError intercept is ordered before the generic APIStatusError handler (it's a subclass). A failing retry collapses to ProviderError — no raw SDK leak.
  • _extract_text — empty completion now raises ProviderError instead of a TypeError that escaped callers (which only catch ProviderError). A length-truncated empty response — a reasoning model spending its whole budget on hidden reasoning — gets an actionable message pointing at llm.max_tokens.

_complete_openai_compat is decomposed into these five helpers, each under the 70-line limit.

Acceptance criteria

  • ✅ Reasoning model produces a completion, or a clear ProviderError (never a raw SDK crash).
  • temperature dropped for reasoning models (omitted, not coerced).
  • ✅ Standard chat models unchanged — regression-guarded (gpt-4o, gpt-5-chat-latest).
  • ✅ Tests cover the parameter-mapping decision, retry, provider-agnostic retry, retry exhaustion, non-param 400, and budget starvation.

Tests

8 new behavior tests through the public complete() seam (so they survive the decomposition); the one existing empty-content test updated TypeErrorProviderError. Full suite: 152 passing, ruff clean on changed files, all functions ≤70 lines.

Deferred (no evidence yet, per TigerStyle)

reasoning_effort/budget config knob, o1-mini system→developer remapping, new ErrorKinds.

🤖 Generated with Claude Code

…d fallback

- Add _is_reasoning_model() to detect o-series and gpt-5 (excluding chat endpoint)
- Introduce _build_completion_kwargs() to route max_tokens/temperature vs max_completion_tokens
- Extract _create_with_param_fallback() to retry with reasoning params on unsupported_parameter 400
- Add _extract_text() with actionable error for token-starved completions (length finish with no text)
- Update README to document reasoning model support and token budget implications
- Add comprehensive test suite covering reasoning detection, param mapping, retry logic, and budget starvation
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Warning

Rate limit exceeded

@AccursedGalaxy has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 28 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 63d4032f-79aa-40de-9fa8-d7bfd15a8e55

📥 Commits

Reviewing files that changed from the base of the PR and between d28a0a8 and ae6a5a5.

📒 Files selected for processing (3)
  • README.md
  • noidea/provider.py
  • tests/test_provider.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-29-reasoning-models

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

- Assert messages is non-empty list in _create_with_param_fallback
- Expand BadRequestError handling to accept param name for robustness
  against SDK changes, with comment explaining retry cost-benefit
- Assert content is string or None in _extract_text before processing
@AccursedGalaxy AccursedGalaxy merged commit 1894e68 into main May 22, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support OpenAI reasoning models (o1/o3/gpt-5) on the OpenAI-compat path

1 participant