feat: OpenAI Responses API compliance with structured tool calling by eloe · Pull Request #11 · eloe/mlx-vlm

eloe · 2026-04-06T16:16:16Z

Summary\nFull OpenAI Responses API (`/v1/responses`) compliance with structured tool calling, `previous_response_id`, streaming events with sequence numbers, tool call suppression in streaming, and 31 tests.\n\n## Files\n- `responses_models.py` — Pydantic models\n- `responses_store.py` — LRU store\n- `server.py` — Rewritten endpoint\n- `tests/test_responses_api.py` — 31 tests

Add full OpenAI Responses API (/v1/responses) compliance including: - Structured function_call output items (parsed from model text) - function_call_output input items for multi-turn tool use - previous_response_id with LRU response store (256 entries) - instructions field with developer-to-system role normalization - "text" type alias accepted alongside "input_text" - tools/tool_choice passthrough to chat template and response echo - Streaming SSE with sequence_number and [DONE] sentinel - incomplete_details for length-truncated responses - parallel_tool_calls, metadata field support New files: - responses_models.py: Self-contained Pydantic models for Responses API - responses_store.py: Thread-safe LRU store for response replay - tests/test_responses_api.py: 31 tests (models, store, endpoint, streaming) Reference: OpenAI Responses API spec and waybarrios/vllm-mlx#214 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When the model generates <tool_call>...</tool_call> markup during streaming, detect the tag and suppress those tokens from being sent as response.output_text.delta events. This prevents raw tool call XML from being displayed to users (e.g., in Telegram via OpenClaw). The tool call is still parsed and emitted as structured function_call events after generation completes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR rewrites the /responses (/v1/responses) endpoint to align more closely with the OpenAI Responses API, including structured tool calling, previous_response_id replay, and SSE streaming event shapes.

Changes:

Added Pydantic request/response + streaming event models for the Responses API.
Introduced an in-memory LRU ResponseStore to support previous_response_id conversation replay.
Reimplemented /responses endpoint with tool-call parsing, streaming SSE events with sequence numbers, and added a new Responses API test suite.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`mlx_vlm/responses_models.py`	Adds Responses API request/response + SSE event models.
`mlx_vlm/responses_store.py`	Implements an LRU response store for `previous_response_id` replay.
`mlx_vlm/server.py`	Rewrites `/responses` endpoint (streaming + non-streaming) using new models/store and tool parsing.
`mlx_vlm/tests/test_responses_api.py`	Adds unit + functional + streaming tests for Responses API behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-06T16:33:55Z

+            ResponseOutputMessageContentList,
+        ]
+    ] = Field(None, description="Content of the message.")
+    tool_calls: List = []


Several Pydantic models use mutable list literals as defaults (e.g., ChatMessage.tool_calls). This can lead to shared state across instances and surprising mutations. Use Field(default_factory=list) (or default_factory for other containers) for these fields.

Suggested change

tool_calls: List = []

tool_calls: List = Field(default_factory=list)

Copilot · 2026-04-06T16:33:56Z

+class ContentPartOutputText(BaseModel):
+    """A text content part in an output message."""
+
+    type: Literal["output_text"] = "output_text"
+    text: str = ""
+    annotations: List[str] = []
+
+
+class ResponseMessageItem(BaseModel):
+    """An assistant message output item."""
+
+    id: str = Field(default_factory=lambda: f"msg_{uuid.uuid4().hex[:24]}")
+    type: Literal["message"] = "message"
+    role: Literal["assistant"] = "assistant"
+    status: Literal["in_progress", "completed"] = "completed"
+    content: List[ContentPartOutputText] = []
+


Mutable list literals are used as defaults for annotations/content fields. Even if most instances won’t mutate them, this is unsafe and can create cross-instance leakage. Prefer Field(default_factory=list) for annotations and content.

Copilot · 2026-04-06T16:33:56Z

+                    for part in content:
+                        if (
+                            isinstance(part, dict)
+                            and part.get("type") == "output_text"
+                        ):
+                            items.append(
+                                {
+                                    "role": "assistant",
+                                    "content": [
+                                        {
+                                            "type": "output_text",
+                                            "text": part.get("text", ""),
+                                        }
+                                    ],
+                                }
+                            )


replay_input() emits a separate assistant message for each output_text part in a stored message. This can change conversation structure when a single assistant message contains multiple content parts. Consider preserving the original message and including all output_text parts in one assistant message item.

Suggested change

for part in content:

if (

isinstance(part, dict)

and part.get("type") == "output_text"

):

items.append(

{

"role": "assistant",

"content": [

{

"type": "output_text",

"text": part.get("text", ""),

}

],

}

)

message_content = []

for part in content:

if (

isinstance(part, dict)

and part.get("type") == "output_text"

):

message_content.append(

{

"type": "output_text",

"text": part.get("text", ""),

}

)

if message_content:

items.append(

{

"role": "assistant",

"content": message_content,

}

)

Copilot · 2026-04-06T16:33:56Z

+                        # Suppress tool call tokens from being streamed as text
+                        if not in_tool_call and tool_call_start_tag in full_text:
+                            in_tool_call = True
+                        if in_tool_call:
+                            continue
+


Streaming tool-call suppression sets in_tool_call=True once the start tag appears, but never resets when the end tag is reached. This will suppress all subsequent text deltas, including legitimate text after a tool call. Track tool_module.tool_call_end (and possibly nested/multiple calls) so streaming resumes after the tool call closes.

Copilot · 2026-04-06T16:33:57Z

+                                images.append(ci.get("image_url", ""))
+                            elif ci_type == "image_url":
+                                img = ci.get("image_url", {})
+                                if isinstance(img, dict):
+                                    images.append(img.get("url", ""))
+                                elif isinstance(img, str):
+                                    images.append(img)


Image URL extraction appends empty strings when image_url/url is missing. This will later cause load_image('') failures and return 500s for malformed input. Validate required fields and raise a 400 (or skip the image part) instead of appending "".

Suggested change

images.append(ci.get("image_url", ""))

elif ci_type == "image_url":

img = ci.get("image_url", {})

if isinstance(img, dict):

images.append(img.get("url", ""))

elif isinstance(img, str):

images.append(img)

image_url = ci.get("image_url")

if not isinstance(image_url, str) or not image_url:

raise HTTPException(

status_code=400,

detail="Invalid input_image content part: 'image_url' must be a non-empty string.",

)

images.append(image_url)

elif ci_type == "image_url":

img = ci.get("image_url")

if isinstance(img, dict):

image_url = img.get("url")

if not isinstance(image_url, str) or not image_url:

raise HTTPException(

status_code=400,

detail="Invalid image_url content part: 'image_url.url' must be a non-empty string.",

)

images.append(image_url)

elif isinstance(img, str):

if not img:

raise HTTPException(

status_code=400,

detail="Invalid image_url content part: 'image_url' must be a non-empty string.",

)

images.append(img)

else:

raise HTTPException(

status_code=400,

detail="Invalid image_url content part: 'image_url' must be a string or an object containing a non-empty 'url' field.",

)

Critical fixes: - Add recursion depth limit (50) and cycle detection for previous_response_id chains to prevent stack overflow/DoS - Fix tool call suppression: in_tool_call now resets on end tag, partial-match only triggers on actual trailing prefix (not any <) - Sanitize error messages in streaming — no longer leaks internals Security fixes: - Validate image URLs are non-empty before appending (prevents load_image('') → 500 errors) Spec compliance: - Remove data: [DONE] sentinel (not part of Responses API spec, response.completed is the terminal event) Code quality: - Fix mutable default lists in Pydantic models (use default_factory) - Log tool parsing failures instead of silently swallowing - Collect multi-part assistant messages into single message in replay - Shorten import aliases to avoid unnecessary verbosity Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

eloe and others added 2 commits April 5, 2026 11:58

eloe requested a review from Copilot April 6, 2026 16:28

Copilot started reviewing on behalf of eloe April 6, 2026 16:29 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

eloe mentioned this pull request Apr 8, 2026

Combined server enhancements: OpenAI API compliance, prompt caching, concurrency #21

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: OpenAI Responses API compliance with structured tool calling#11

feat: OpenAI Responses API compliance with structured tool calling#11
eloe wants to merge 3 commits into
mainfrom
feature/openai-responses-api

eloe commented Apr 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	tool_calls: List = []
	tool_calls: List = Field(default_factory=list)

-                                images.append(ci.get("image_url", ""))
-                            elif ci_type == "image_url":
-                                img = ci.get("image_url", {})
-                                if isinstance(img, dict):
-                                    images.append(img.get("url", ""))
-                                elif isinstance(img, str):
-                                    images.append(img)
+                                image_url = ci.get("image_url")
+                                if not isinstance(image_url, str) or not image_url:
+                                    raise HTTPException(
+                                        status_code=400,
+                                        detail="Invalid input_image content part: 'image_url' must be a non-empty string.",
+                                    )
+                                images.append(image_url)
+                            elif ci_type == "image_url":
+                                img = ci.get("image_url")
+                                if isinstance(img, dict):
+                                    image_url = img.get("url")
+                                    if not isinstance(image_url, str) or not image_url:
+                                        raise HTTPException(
+                                            status_code=400,
+                                            detail="Invalid image_url content part: 'image_url.url' must be a non-empty string.",
+                                        )
+                                    images.append(image_url)
+                                elif isinstance(img, str):
+                                    if not img:
+                                        raise HTTPException(
+                                            status_code=400,
+                                            detail="Invalid image_url content part: 'image_url' must be a non-empty string.",
+                                        )
+                                    images.append(img)
+                                else:
+                                    raise HTTPException(
+                                        status_code=400,
+                                        detail="Invalid image_url content part: 'image_url' must be a string or an object containing a non-empty 'url' field.",
+                                    )

Conversation

eloe commented Apr 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants