Describe the bug
If the inference back end hits the reasoning budget, the response generation is aborted.
To Reproduce
Steps to reproduce the behavior:
- Configure the model with a small reasoning budget (such as
--reasoning-budget 64)
- Submit a prompt
- Watch the reasoning occur
- Response is aborted and the prompt returned to the input field.
Expected behavior
Response continues after reasoning if either a natural EOS or forced EOS occurs.
Screenshots
I mean... just look with your eyes.
Desktop (please complete the following information):
- OS:
Darwin localhost 24.6.0 Darwin Kernel Version 24.6.0: Tue Apr 21 20:13:48 PDT 2026; root:xnu-11417.140.69.710.16~1/RELEASE_ARM64_T8112 arm64 arm Darwin
- Browser:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/26.5 Safari/605.1.15
- Version: Sidekick 1.0.0-rc.18 (38)
Additional context
- Network inference:
llama-swap -> llama.cpp main -> Gemma-4-26B-A4B Q6_K (bf16 mmproj)
llama-swap returns 200
llama.cpp returns 200
Describe the bug
If the inference back end hits the reasoning budget, the response generation is aborted.
To Reproduce
Steps to reproduce the behavior:
--reasoning-budget 64)Expected behavior
Response continues after reasoning if either a natural EOS or forced EOS occurs.
Screenshots
I mean... just look with your eyes.
Desktop (please complete the following information):
Darwin localhost 24.6.0 Darwin Kernel Version 24.6.0: Tue Apr 21 20:13:48 PDT 2026; root:xnu-11417.140.69.710.16~1/RELEASE_ARM64_T8112 arm64 arm DarwinMozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/26.5 Safari/605.1.15Additional context
llama-swap->llama.cppmain -> Gemma-4-26B-A4B Q6_K (bf16 mmproj)llama-swapreturns 200llama.cppreturns 200