fix: return finish_reason=tool_calls when tool calls detected#14
Conversation
OpenAI-compatible clients check finish_reason to decide whether to enter the tool execution loop. Previously mlx-vlm always returned "stop" even when process_tool_calls found calls in the model output. Now both streaming and non-streaming /chat/completions responses return finish_reason="tool_calls" when tool calls are present, and "stop" otherwise. Adds 3 tests covering: stop without tools, tool_calls with tools, stop with tools but no calls made. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the OpenAI-compatible /chat/completions endpoint to return finish_reason="tool_calls" (instead of "stop") when tool calls are detected, aligning both streaming and non-streaming responses with expected tool-calling semantics.
Changes:
- Set streaming final chunk
finish_reasonto"tool_calls"whenprocess_tool_callsfinds calls. - Set non-streaming response
finish_reasonto"tool_calls"when tool calls are present. - Add unit tests covering non-streaming
finish_reasonbehavior for tool/no-tool scenarios.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
mlx_vlm/server.py |
Computes finish_reason based on detected tool calls for both streaming and non-streaming chat completions. |
mlx_vlm/tests/test_server.py |
Adds tests asserting finish_reason is "stop" vs "tool_calls" for non-streaming /chat/completions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -1,4 +1,5 @@ | |||
| import argparse | |||
| import asyncio | |||
There was a problem hiding this comment.
asyncio is imported but never used in this module. Please remove the unused import to avoid lint failures and keep imports minimal.
| import asyncio |
| else: | ||
| tool_calls = {} | ||
| tool_calls["calls"] = [] | ||
|
|
||
| # Signal stream end | ||
| # Signal stream end with correct finish_reason | ||
| stream_finish = "tool_calls" if tool_calls.get("calls") else "stop" | ||
| choices = [ | ||
| ChatStreamChoice( | ||
| finish_reason="stop", | ||
| finish_reason=stream_finish, | ||
| delta=ChatMessage( |
There was a problem hiding this comment.
The streaming path now sets finish_reason based on whether tool calls were detected, but the new/updated behavior isn’t covered by tests in this file (current tests only exercise non-streaming /chat/completions). Please add a streaming test case (e.g., stream=True) that asserts the final SSE chunk includes finish_reason: tool_calls when process_tool_calls returns calls, and stop when it doesn’t.
Remove unused asyncio import and add streaming test for finish_reason=tool_calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove unused asyncio import and add streaming test for finish_reason=tool_calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary\nReturn finish_reason="tool_calls" instead of "stop" when process_tool_calls finds calls. Both streaming and non-streaming /chat/completions. 3 tests.