Skip to content

feat(ai): support thinking/reasoning models in OpenAI-compatible strategy#177

Open
ImIvanGil wants to merge 1 commit into
Axenide:mainfrom
ImIvanGil:feat/ai-thinking-models-support
Open

feat(ai): support thinking/reasoning models in OpenAI-compatible strategy#177
ImIvanGil wants to merge 1 commit into
Axenide:mainfrom
ImIvanGil:feat/ai-thinking-models-support

Conversation

@ImIvanGil
Copy link
Copy Markdown

Problem

OpenAI-compatible thinking / reasoning models — Kimi K2.5, K2.6, kimi-*-thinking variants, GPT o1 family — fail silently in the AI panel with the generic message "No response received from the API.", regardless of prompt.

There are two distinct upstream causes, both in OpenAiApiStrategy.qml:

Cause 1: hardcoded temperature: 0.7

getBody() hardcodes temperature: 0.7. Thinking models reject any temperature other than 1 with HTTP 400:

{
    "error": {
        "message": "invalid temperature: only 1 is allowed for this model",
        "type": "invalid_request_error"
    }
}

The error is a single-line JSON, so the SSE-parsing SplitParser ignores it (no data: prefix), and curl exits 0 with the buffer never populated → the panel falls into its "no streaming data received" branch and shows the generic placeholder.

Cause 2: parser ignores reasoning_content

Thinking models stream their chain-of-thought as delta.reasoning_content before emitting the actual answer as delta.content. The existing parser only checks delta.content:

if (delta && delta.content)
    return { content: delta.content, done: false, error: null };

All reasoning chunks are dropped. By the time the actual delta.content chunks arrive (sometimes hundreds of reasoning chunks later), various downstream timing issues kick in — buffer's still empty, parser misses the small content tail, etc. Either way: user sees nothing.

Reproduced empirically against api.moonshot.ai/v1: a "hola" prompt to kimi-k2.6 produced 196 reasoning_content chunks then 2 content chunks ("Hola"). The 196-to-2 ratio is typical.

What this PR does

Two minimal additive changes in OpenAiApiStrategy.qml:

1. Dynamic temperature

+let temp = 0.7;
+if (model.model && /k2\.(5|6)|thinking|^o1(-|$)/.test(model.model)) {
+    temp = 1;
+}
 let body = {
     model: model.model,
     messages: _formatMessages(messages),
-    temperature: 0.7
+    temperature: temp
 };

Regex covers:

  • k2.5, k2.6 (Kimi K2.5/K2.6 vision-and-thinking)
  • *thinking* (kimi-thinking-preview, kimi-k2-thinking, kimi-k2-thinking-turbo, …)
  • o1, o1-preview, o1-mini, etc. (OpenAI reasoning family)

2. reasoning_content accumulation

parseStreamChunk():

 if (delta && delta.content)
     return { content: delta.content, done: false, error: null };

+if (delta && delta.reasoning_content)
+    return { content: delta.reasoning_content, done: false, error: null };

parseResponse() (non-stream):

+let outContent = msg.content || "";
+if (msg.reasoning_content && !outContent) {
+    outContent = msg.reasoning_content;
+}
-return { content: msg.content };
+return { content: outContent };

Surfaces the model's chain-of-thought as it streams in, then the final answer concatenated at the end — same flow as ChatGPT-o1 and Claude thinking UIs. Non-thinking models are unchanged.

Tested with

  • Kimi K2.6 (heavy thinking, 196+ reasoning chunks per short prompt) — works.
  • Kimi K2 (0905 preview) (no thinking, direct content stream) — unchanged, works.
  • Moonshot v1 family (no thinking) — unchanged, works.

Visual note

Reasoning and final answer arrive in the same chat bubble, concatenated. Could be polished in a follow-up that renders them in separate styled sections (greyed-out reasoning block + answer below), but that's a UX call worth its own PR.

Diff stats

modules/services/ai/strategies/OpenAiApiStrategy.qml | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)

Related

…tegy

OpenAI-compatible "thinking" models — Kimi K2.5, K2.6, kimi-*-thinking
variants, GPT o1 family — emit a different stream shape and reject the
default temperature. Without this patch, every request to them fails
silently in the panel with the generic "No response received from the
API." message.

Two changes, both in OpenAiApiStrategy.qml:

1. **Dynamic temperature** in getBody():

   Thinking models require `temperature: 1` and reject anything else
   with HTTP 400 `invalid_request_error` ("only 1 is allowed for this
   model"). The current hardcoded `temperature: 0.7` causes every
   thinking-model request to fail before streaming even starts.

   Fixed by regex-detecting thinking model IDs:
       /k2\.(5|6)|thinking|^o1(-|$)/

   Other models continue to use 0.7 unchanged.

2. **reasoning_content support** in parseStreamChunk() and parseResponse():

   Thinking models emit `delta.reasoning_content` (and `message.reasoning_content`
   in non-stream) BEFORE the final `delta.content`. The existing parser
   only checks `delta.content`, so all reasoning chunks are ignored and
   the response buffer ends up empty.

   With this patch, reasoning_content is treated as content and surfaced
   to the user — they see the model's chain-of-thought streaming in,
   then the final answer concatenated at the end. Same flow as
   ChatGPT/Claude thinking UIs.

This is purely additive — non-thinking models behave identically to
before. Tested with Kimi K2.6 (long thinking) and K2 (0905-preview,
non-thinking) — both work; non-thinking is unchanged.

Note: relies on PR Axenide#176 to register custom OpenAI-compatible providers
(Kimi/Moonshot, OpenRouter, etc.) via Config.ai.extraModels. Without
that PR, only providers with built-in fetch (Gemini/OpenAI/etc.) benefit
from the temperature fix here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant