Skip to content

BUG: Response is aborted when reasoning budget is exceeded #89

@majick

Description

@majick

Describe the bug
If the inference back end hits the reasoning budget, the response generation is aborted.

To Reproduce
Steps to reproduce the behavior:

  1. Configure the model with a small reasoning budget (such as --reasoning-budget 64)
  2. Submit a prompt
  3. Watch the reasoning occur
  4. Response is aborted and the prompt returned to the input field.

Expected behavior
Response continues after reasoning if either a natural EOS or forced EOS occurs.

Screenshots
I mean... just look with your eyes.

Desktop (please complete the following information):

  • OS: Darwin localhost 24.6.0 Darwin Kernel Version 24.6.0: Tue Apr 21 20:13:48 PDT 2026; root:xnu-11417.140.69.710.16~1/RELEASE_ARM64_T8112 arm64 arm Darwin
  • Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/26.5 Safari/605.1.15
  • Version: Sidekick 1.0.0-rc.18 (38)

Additional context

  • Network inference: llama-swap -> llama.cpp main -> Gemma-4-26B-A4B Q6_K (bf16 mmproj)
  • llama-swap returns 200
  • llama.cpp returns 200

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions