fix(openai-client): salvage reasoning_content when message.content is empty by rafaelreis-r · Pull Request #78 · viperrcrypto/Siftly

rafaelreis-r · 2026-05-02T17:21:53Z

Summary

llama-server and other OpenAI-compatible servers fronting reasoning models (Qwen3, GLM-4, etc.) split responses into message.content and message.reasoning_content. When a model hits max_tokens before emitting the closing </think> token, content is empty and the actual answer is in reasoning_content.

This currently surfaces as No text content in AI response failures for self-hosted users behind llama-swap, even though the model produced the requested JSON inside its reasoning trace.

When that happens, scan reasoning_content for the last JSON array/object and use it as the response text. Callers (categorizeBatch, enrichBatchSemanticTags) already extract the last [...]/{...} block from the response, so fishing it out a step earlier is the smallest unblocking change.

Test plan

Standard OpenAI / non-thinking response (content populated): unchanged
llama-server response with empty content and JSON in reasoning_content: text is salvaged

🤖 Generated with Claude Code

… empty llama-server (and other OpenAI-compatible servers fronting reasoning models — Qwen3, GLM-4, gemma3, etc.) split the response into \`message.content\` and \`message.reasoning_content\`. When the model hits \`max_tokens\` before emitting the closing \`</think>\` token, the client sees empty \`content\` and the actual answer sits in \`reasoning_content\` instead. When that happens, scan reasoning_content for the last JSON array or object and use it as the response text. This matches the parsing that callers (\`categorizeBatch\`, \`enrichBatchSemanticTags\`) already do on the response — they look for the last \`[...]\`/\`{...}\` block — so fishing it out a step earlier is the smallest unblocking change. Effect: self-hosted setups behind llama-swap with thinking-prone models no longer fail every batch with "No text content in AI response".

…g fallback The first regex was too greedy and matched non-JSON brackets like "[link]" that appear in markdown placeholders inside reasoning text. Replace with brace-balanced scan that finds all candidate substrings and tries JSON.parse on each in descending length order, returning the first one that parses.

Self-hosted OpenAI-compat servers (llama-server, vLLM, ollama) often emit a <think>…</think> reasoning block before the answer. The caller's max_tokens is sized for the answer alone, so reasoning consumes the entire budget and content stays empty. When OPENAI_BASE_URL is set (proxy in use), auto-bump max_tokens to ≥ 8192. Tunable via OPENAI_MIN_MAX_TOKENS env var. No behavior change when calling the real OpenAI API (no OPENAI_BASE_URL env).

rafaelreis-r added 3 commits May 2, 2026 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(openai-client): salvage reasoning_content when message.content is empty#78

fix(openai-client): salvage reasoning_content when message.content is empty#78
rafaelreis-r wants to merge 3 commits intoviperrcrypto:mainfrom
rafaelreis-r:pr/fallback-reasoning-content

rafaelreis-r commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rafaelreis-r commented May 2, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant