fix: prevent tool name/arg concatenation for Ollama-compatible endpoints#3468
Closed
dmater01 wants to merge 1 commit intoNousResearch:mainfrom
Closed
fix: prevent tool name/arg concatenation for Ollama-compatible endpoints#3468dmater01 wants to merge 1 commit intoNousResearch:mainfrom
dmater01 wants to merge 1 commit intoNousResearch:mainfrom
Conversation
… tool calls Ollama's OpenAI-compatible endpoint (localhost:11434/v1) sends every tool call in a parallel batch at the same streaming index (typically 0), rather than incrementing the index per the OpenAI streaming spec. This caused the _call_chat_completions accumulator to merge all names and arguments into a single slot, producing malformed tool names like 'write_filewrite_filewrite_file' and concatenated JSON arguments, followed by HTTP 400 errors on the next turn. Root cause: the accumulator keyed deltas by tc_delta.index. When all parallel tool calls arrive at index 0, they all land in tool_calls_acc[0]. Fix: add two per-stream tracking dicts (_last_id_at_idx, _active_slot_by_idx). Ollama always assigns a distinct id to each tool call in the batch (call_function_xxx_1, _2, _3, …). When a new non-empty id appears at an already-active index, the accumulator allocates a fresh slot and redirects subsequent argument chunks there, keeping each tool call's name and args separate. Providers that correctly increment index (OpenAI, Anthropic, OpenRouter) are unaffected — a new id at a *new* index never triggers the redirect path. Verified with MiniMax M2.7 via Ollama: full 13-test scorecard passes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When using any model served via Ollama's OpenAI-compatible endpoint (
http://localhost:11434/v1), Hermes produces malformed tool call requests that result in HTTP 400 errors and this log pattern:Root Cause
Ollama sends all parallel tool calls at the same streaming index (typically
0) instead of incrementing per the OpenAI streaming spec. The existing accumulator in_call_chat_completionskeys deltas bytc_delta.index, so every tool call lands intool_calls_acc[0]— causing names and arguments to concatenate.Example: a prompt asking to write 3 files causes Ollama to emit 3 tool calls all at
index=0. The accumulator merges them intoname='write_filewrite_filewrite_file'with three JSON objects concatenated inarguments.Confirmed via live inspection of Ollama's streaming output:
Fix
Add two per-stream tracking dicts (
_last_id_at_idx,_active_slot_by_idx) inside_call_chat_completions. Ollama always assigns a distinctidto each tool call in the batch (call_function_xxx_1,_2,_3, …). When a new non-emptyidappears at an already-active index, the accumulator allocates a fresh slot and routes subsequent argument chunks there.Changes are confined to
run_agent.py, inside the streaming accumulator loop. No new dependencies.Compatibility
Providers that correctly increment
index(OpenAI, Anthropic, OpenRouter) are unaffected — a newidat a new index never triggers the reallocation path.Testing
test_streaming.py,test_agent_loop_tool_calling.py,test_run_agent.py)write_filecalls produceslots=3(one per file) instead of 1 concatenated slot