Skip to content

fix: prevent tool name/arg concatenation for Ollama-compatible endpoints#3468

Closed
dmater01 wants to merge 1 commit intoNousResearch:mainfrom
dmater01:fix/ollama-parallel-tool-call-concatenation
Closed

fix: prevent tool name/arg concatenation for Ollama-compatible endpoints#3468
dmater01 wants to merge 1 commit intoNousResearch:mainfrom
dmater01:fix/ollama-parallel-tool-call-concatenation

Conversation

@dmater01
Copy link
Copy Markdown
Contributor

Problem

When using any model served via Ollama's OpenAI-compatible endpoint (http://localhost:11434/v1), Hermes produces malformed tool call requests that result in HTTP 400 errors and this log pattern:

⚠️  Unknown tool 'terminalterminal' — sending error to model for self-correction
⚠️  Unknown tool 'write_filewrite_filewrite_file' — sending error to model for self-correction
⚠️  API call failed: HTTP 400: invalid tool call arguments

Root Cause

Ollama sends all parallel tool calls at the same streaming index (typically 0) instead of incrementing per the OpenAI streaming spec. The existing accumulator in _call_chat_completions keys deltas by tc_delta.index, so every tool call lands in tool_calls_acc[0] — causing names and arguments to concatenate.

Example: a prompt asking to write 3 files causes Ollama to emit 3 tool calls all at index=0. The accumulator merges them into name='write_filewrite_filewrite_file' with three JSON objects concatenated in arguments.

Confirmed via live inspection of Ollama's streaming output:

chunk: index=0, id='call_function_xxx_1', name='write_file', args='{"path":"alpha.txt",...}'
chunk: index=0, id='call_function_xxx_2', name='write_file', args='{"path":"beta.txt",...}'
chunk: index=0, id='call_function_xxx_3', name='write_file', args='{"path":"gamma.txt",...}'

Fix

Add two per-stream tracking dicts (_last_id_at_idx, _active_slot_by_idx) inside _call_chat_completions. Ollama always assigns a distinct id to each tool call in the batch (call_function_xxx_1, _2, _3, …). When a new non-empty id appears at an already-active index, the accumulator allocates a fresh slot and routes subsequent argument chunks there.

Changes are confined to run_agent.py, inside the streaming accumulator loop. No new dependencies.

Compatibility

Providers that correctly increment index (OpenAI, Anthropic, OpenRouter) are unaffected — a new id at a new index never triggers the reallocation path.

Testing

  • Verified with MiniMax M2.7 via Ollama: full 13-test scorecard passes
  • Existing test suite: 238 passed, 0 regressions (test_streaming.py, test_agent_loop_tool_calling.py, test_run_agent.py)
  • End-to-end simulation with live Ollama stream confirms 3 write_file calls produce slots=3 (one per file) instead of 1 concatenated slot

… tool calls

Ollama's OpenAI-compatible endpoint (localhost:11434/v1) sends every tool call
in a parallel batch at the same streaming index (typically 0), rather than
incrementing the index per the OpenAI streaming spec.  This caused the
_call_chat_completions accumulator to merge all names and arguments into a
single slot, producing malformed tool names like 'write_filewrite_filewrite_file'
and concatenated JSON arguments, followed by HTTP 400 errors on the next turn.

Root cause: the accumulator keyed deltas by tc_delta.index.  When all parallel
tool calls arrive at index 0, they all land in tool_calls_acc[0].

Fix: add two per-stream tracking dicts (_last_id_at_idx, _active_slot_by_idx).
Ollama always assigns a distinct id to each tool call in the batch
(call_function_xxx_1, _2, _3, …).  When a new non-empty id appears at an
already-active index, the accumulator allocates a fresh slot and redirects
subsequent argument chunks there, keeping each tool call's name and args
separate.

Providers that correctly increment index (OpenAI, Anthropic, OpenRouter) are
unaffected — a new id at a *new* index never triggers the redirect path.

Verified with MiniMax M2.7 via Ollama: full 13-test scorecard passes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@teknium1
Copy link
Copy Markdown
Contributor

Merged via #3582 — cherry-picked onto current main with authorship preserved, added two regression tests (reused-index and streamed-args cases). Thanks @dmater01!

@teknium1 teknium1 closed this Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants