feat(provider): support OpenAI reasoning models (o1/o3/gpt-5) on the compat path#30
Conversation
…d fallback - Add _is_reasoning_model() to detect o-series and gpt-5 (excluding chat endpoint) - Introduce _build_completion_kwargs() to route max_tokens/temperature vs max_completion_tokens - Extract _create_with_param_fallback() to retry with reasoning params on unsupported_parameter 400 - Add _extract_text() with actionable error for token-starved completions (length finish with no text) - Update README to document reasoning model support and token budget implications - Add comprehensive test suite covering reasoning detection, param mapping, retry logic, and budget starvation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Assert messages is non-empty list in _create_with_param_fallback - Expand BadRequestError handling to accept param name for robustness against SDK changes, with comment explaining retry cost-benefit - Assert content is string or None in _extract_text before processing
Closes #29.
Problem
The shared OpenAI-compat transport sent
max_tokensandtemperatureunconditionally. OpenAI's reasoning models (o-series,gpt-5) reject both — they requiremax_completion_tokensand accept only the defaulttemperature. A user who configured one onprovider: openaigot a raw SDKBadRequestError, not graceful handling.Design
Decision-tree was grilled before any code; the load-bearing call is the predicate is an optimization, the retry is the correctness guarantee.
_build_completion_kwargs— the one place the parameter split lives. Reasoning →max_completion_tokens, omittemperature; everything else → unchangedmax_tokens+temperature._is_reasoning_model— an OpenAI-only name hint (^o\d/gpt-5, excludinggpt-5-chat-latest). It only avoids a wasted round-trip; it is deliberately narrow and never the source of truth._create_with_param_fallback— one-shot retry on the structured 400 (code == "unsupported_parameter"). Keys off the error, not the provider, so it self-corrects for any compat backend (Groq/Gemini/DeepSeek) and survives OpenAI's naming churn.BadRequestErrorintercept is ordered before the genericAPIStatusErrorhandler (it's a subclass). A failing retry collapses toProviderError— no raw SDK leak._extract_text— empty completion now raisesProviderErrorinstead of aTypeErrorthat escaped callers (which only catchProviderError). A length-truncated empty response — a reasoning model spending its whole budget on hidden reasoning — gets an actionable message pointing atllm.max_tokens._complete_openai_compatis decomposed into these five helpers, each under the 70-line limit.Acceptance criteria
ProviderError(never a raw SDK crash).temperaturedropped for reasoning models (omitted, not coerced).gpt-4o,gpt-5-chat-latest).Tests
8 new behavior tests through the public
complete()seam (so they survive the decomposition); the one existing empty-content test updatedTypeError→ProviderError. Full suite: 152 passing, ruff clean on changed files, all functions ≤70 lines.Deferred (no evidence yet, per TigerStyle)
reasoning_effort/budget config knob,o1-minisystem→developer remapping, newErrorKinds.🤖 Generated with Claude Code