feat: OpenAI Responses API compliance with structured tool calling#952
Closed
eloe wants to merge 2 commits into
Closed
feat: OpenAI Responses API compliance with structured tool calling#952eloe wants to merge 2 commits into
eloe wants to merge 2 commits into
Conversation
Add full OpenAI Responses API (/v1/responses) compliance including: - Structured function_call output items (parsed from model text) - function_call_output input items for multi-turn tool use - previous_response_id with LRU response store (256 entries) - instructions field with developer-to-system role normalization - "text" type alias accepted alongside "input_text" - tools/tool_choice passthrough to chat template and response echo - Streaming SSE with sequence_number and [DONE] sentinel - incomplete_details for length-truncated responses - parallel_tool_calls, metadata field support New files: - responses_models.py: Self-contained Pydantic models for Responses API - responses_store.py: Thread-safe LRU store for response replay - tests/test_responses_api.py: 31 tests (models, store, endpoint, streaming) Reference: OpenAI Responses API spec and waybarrios/vllm-mlx#214 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the model generates <tool_call>...</tool_call> markup during streaming, detect the tag and suppress those tokens from being sent as response.output_text.delta events. This prevents raw tool call XML from being displayed to users (e.g., in Telegram via OpenClaw). The tool call is still parsed and emitted as structured function_call events after generation completes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merged
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Full OpenAI Responses API (
/v1/responses) compliance for agentic coding workflows.function_calloutput items parsed from model text via existing tool parsersfunction_call_outputinput items for multi-turn tool use loopsprevious_response_idwith LRU response store (256 entries) for conversation replayinstructionsfield with developer→system role normalization"text"type alias accepted alongside"input_text"sequence_numberand[DONE]sentinelincomplete_detailsfor length-truncated responsesparallel_tool_calls,metadatafield supportFiles
mlx_vlm/responses_models.py— Self-contained Pydantic models for Responses APImlx_vlm/responses_store.py— Thread-safe LRU store for response replaymlx_vlm/server.py— Rewritten/v1/responsesendpoint with input conversion + tool call output buildingmlx_vlm/tests/test_responses_api.py— 31 tests (models, store, endpoint, streaming)Motivation
Agent frameworks like OpenClaw and Hermes use the Responses API for tool-calling loops on local models. The existing
/v1/responsesendpoint returned raw text without structured tool call items, breaking agentic workflows.Testing
Reference