feat(ai): support thinking/reasoning models in OpenAI-compatible strategy#177
Open
ImIvanGil wants to merge 1 commit into
Open
feat(ai): support thinking/reasoning models in OpenAI-compatible strategy#177ImIvanGil wants to merge 1 commit into
ImIvanGil wants to merge 1 commit into
Conversation
…tegy
OpenAI-compatible "thinking" models — Kimi K2.5, K2.6, kimi-*-thinking
variants, GPT o1 family — emit a different stream shape and reject the
default temperature. Without this patch, every request to them fails
silently in the panel with the generic "No response received from the
API." message.
Two changes, both in OpenAiApiStrategy.qml:
1. **Dynamic temperature** in getBody():
Thinking models require `temperature: 1` and reject anything else
with HTTP 400 `invalid_request_error` ("only 1 is allowed for this
model"). The current hardcoded `temperature: 0.7` causes every
thinking-model request to fail before streaming even starts.
Fixed by regex-detecting thinking model IDs:
/k2\.(5|6)|thinking|^o1(-|$)/
Other models continue to use 0.7 unchanged.
2. **reasoning_content support** in parseStreamChunk() and parseResponse():
Thinking models emit `delta.reasoning_content` (and `message.reasoning_content`
in non-stream) BEFORE the final `delta.content`. The existing parser
only checks `delta.content`, so all reasoning chunks are ignored and
the response buffer ends up empty.
With this patch, reasoning_content is treated as content and surfaced
to the user — they see the model's chain-of-thought streaming in,
then the final answer concatenated at the end. Same flow as
ChatGPT/Claude thinking UIs.
This is purely additive — non-thinking models behave identically to
before. Tested with Kimi K2.6 (long thinking) and K2 (0905-preview,
non-thinking) — both work; non-thinking is unchanged.
Note: relies on PR Axenide#176 to register custom OpenAI-compatible providers
(Kimi/Moonshot, OpenRouter, etc.) via Config.ai.extraModels. Without
that PR, only providers with built-in fetch (Gemini/OpenAI/etc.) benefit
from the temperature fix here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
OpenAI-compatible thinking / reasoning models — Kimi K2.5, K2.6, kimi-
*-thinking variants, GPT o1 family — fail silently in the AI panel with the generic message "No response received from the API.", regardless of prompt.There are two distinct upstream causes, both in
OpenAiApiStrategy.qml:Cause 1: hardcoded
temperature: 0.7getBody()hardcodestemperature: 0.7. Thinking models reject any temperature other than1with HTTP 400:{ "error": { "message": "invalid temperature: only 1 is allowed for this model", "type": "invalid_request_error" } }The error is a single-line JSON, so the SSE-parsing
SplitParserignores it (nodata:prefix), andcurlexits 0 with the buffer never populated → the panel falls into its "no streaming data received" branch and shows the generic placeholder.Cause 2: parser ignores
reasoning_contentThinking models stream their chain-of-thought as
delta.reasoning_contentbefore emitting the actual answer asdelta.content. The existing parser only checksdelta.content:All reasoning chunks are dropped. By the time the actual
delta.contentchunks arrive (sometimes hundreds of reasoning chunks later), various downstream timing issues kick in — buffer's still empty, parser misses the small content tail, etc. Either way: user sees nothing.Reproduced empirically against
api.moonshot.ai/v1: a "hola" prompt tokimi-k2.6produced 196reasoning_contentchunks then 2contentchunks ("Hola"). The 196-to-2 ratio is typical.What this PR does
Two minimal additive changes in
OpenAiApiStrategy.qml:1. Dynamic temperature
Regex covers:
k2.5,k2.6(Kimi K2.5/K2.6 vision-and-thinking)*thinking*(kimi-thinking-preview, kimi-k2-thinking, kimi-k2-thinking-turbo, …)o1,o1-preview,o1-mini, etc. (OpenAI reasoning family)2.
reasoning_contentaccumulationparseStreamChunk():if (delta && delta.content) return { content: delta.content, done: false, error: null }; +if (delta && delta.reasoning_content) + return { content: delta.reasoning_content, done: false, error: null };parseResponse()(non-stream):Surfaces the model's chain-of-thought as it streams in, then the final answer concatenated at the end — same flow as ChatGPT-o1 and Claude thinking UIs. Non-thinking models are unchanged.
Tested with
Visual note
Reasoning and final answer arrive in the same chat bubble, concatenated. Could be polished in a follow-up that renders them in separate styled sections (greyed-out reasoning block + answer below), but that's a UX call worth its own PR.
Diff stats
Related
Config.ai.extraModels— otherwise users can't even select a thinking model to test against. Without feat(ai): wire Config.ai.extraModels for user-defined OpenAI-compat providers #176, the temperature fix still helps any built-in provider that ever adds thinking variants.