Support previous_response_id replay for /responses by eloe · Pull Request #23 · eloe/mlx-vlm

eloe · 2026-04-28T04:47:26Z

Summary

add replay expansion for /responses requests that reference previous_response_id
persist completed response input/output snapshots in the in-memory response store
add pure unit tests for replay chaining without importing the MLX server stack

Notes

draft fork PR while upstream issue /responses accepts previous_response_id but does not replay previous response context Blaizzy/mlx-vlm#1046 is still under discussion
keeps the change focused on replay support instead of reviving the older large Responses API stack

Copilot

Pull request overview

Adds support for replaying prior /responses context via previous_response_id, backed by a new in-memory response store and helper utilities to expand/parse Responses-style input items.

Changes:

Add ResponseStore (thread-safe LRU) to persist completed response input/output snapshots for replay.
Add responses_replay helpers to expand previous_response_id and convert Responses input items into server chat/messages/images.
Wire replay expansion + store persistence into the /responses endpoint and add pure unit tests for replay/store behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`mlx_vlm/server.py`	Adds `previous_response_id` handling, expands inputs via replay helpers, and saves completed snapshots into a global store.
`mlx_vlm/responses_store.py`	Implements an in-memory LRU store and a `replay_input()` method to rebuild prior context.
`mlx_vlm/responses_replay.py`	Implements replay expansion and conversion from Responses input items to server chat/messages/images.
`mlx_vlm/tests/test_responses_store.py`	Unit tests for LRU store behavior and replay input reconstruction.
`mlx_vlm/tests/test_responses_replay.py`	Unit tests for replay expansion and Responses input parsing helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T04:52:06Z

+_responses_store = ResponseStore()




A module-level _responses_store = ResponseStore() will retain recent requests/responses in memory for the life of the process. Because you’re saving expanded_input (which can include base64 input_image payloads and can grow with replay chaining), this can become a significant memory footprint even with LRU eviction. Consider making the store size configurable (env/flag), adding an optional TTL / byte-size cap per entry, or avoiding storing large image payloads when replay is enabled.

Suggested change

_responses_store = ResponseStore()

def _get_bool_env(name: str, default: bool) -> bool:

value = os.environ.get(name)

if value is None:

return default

return value.strip().lower() in {"1", "true", "yes", "on"}

class _LazyResponseStore:

def __init__(self):

self._enabled = _get_bool_env("RESPONSES_STORE_ENABLED", True)

self._store = None

def _get_store(self):

if not self._enabled:

return None

if self._store is None:

self._store = ResponseStore()

return self._store

def __getattr__(self, name: str):

store = self._get_store()

if store is None:

def _noop(*args, **kwargs):

return None

return _noop

return getattr(store, name)

_responses_store = _LazyResponseStore()

Copilot · 2026-04-28T04:52:06Z

+        try:
+            expanded_input = resolve_responses_input_items(
+                openai_request.input,
+                previous_response_id=openai_request.previous_response_id,
+                response_store=_responses_store,
+            )
+            chat_messages, images, instructions = responses_input_to_messages(
+                expanded_input
+            )
+        except LookupError as exc:
+            raise HTTPException(
+                status_code=404,
+                detail=f"Previous response not found: {exc.args[0]}",
+            ) from exc
+        except ValueError as exc:
+            raise HTTPException(status_code=400, detail=str(exc)) from exc


previous_response_id handling and error mapping (404 for missing response, 400 for invalid input items) is newly introduced here, but there’s no endpoint-level test coverage exercising these paths (existing /responses tests only assert sampling args forwarding). Adding a lightweight FastAPI TestClient test that seeds the store and verifies replay expansion (and the 404 path) would help prevent regressions.

Copilot · 2026-04-28T04:52:07Z

+            item_type = output_dict.get("type", "")
+            if item_type == "message":
+                content = output_dict.get("content", [])
+                output_text_parts = []
+                for part in content:


replay_input() only rehydrates assistant output when stored output items have type == "message". However, the /responses non-streaming path stores response.output items shaped like ChatMessage (e.g., {role, content}) without a type field, so those assistant outputs will be silently dropped during replay and chaining will lose assistant context. Consider also accepting ChatMessage-shaped outputs (role/content) here, or normalizing stored outputs to the Responses API message item schema before replaying.

Copilot · 2026-04-28T04:52:07Z

                        "total_tokens": prompt_tokens + output_tokens,
                    },
                )
+                _responses_store.save(response_id, expanded_input, response.output)



The non-streaming /responses code saves response.output into the replay store, but the output item constructed above does not include a Responses-API type: "message" (it’s currently a {role, content, reasoning} dict). As a result, ResponseStore.replay_input() will not include this assistant output in replay context, breaking previous_response_id for the most common (non-streaming) case. Please align the non-streaming output shape with the streaming MessageItem/type:"message" format (or normalize the output before saving).

Eric Loes added 2 commits April 27, 2026 21:44

Add reusable response store for replay state

2467af9

Add previous_response_id replay support

757bfa1

Copilot AI review requested due to automatic review settings April 28, 2026 04:47

Copilot started reviewing on behalf of eloe April 28, 2026 04:47 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support previous_response_id replay for /responses#23

Support previous_response_id replay for /responses#23
eloe wants to merge 2 commits into
mainfrom
codex/issue-1046-previous-response-id

eloe commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-_responses_store = ResponseStore()
+def _get_bool_env(name: str, default: bool) -> bool:
+    value = os.environ.get(name)
+    if value is None:
+        return default
+    return value.strip().lower() in {"1", "true", "yes", "on"}
+class _LazyResponseStore:
+    def __init__(self):
+        self._enabled = _get_bool_env("RESPONSES_STORE_ENABLED", True)
+        self._store = None
+    def _get_store(self):
+        if not self._enabled:
+            return None
+        if self._store is None:
+            self._store = ResponseStore()
+        return self._store
+    def __getattr__(self, name: str):
+        store = self._get_store()
+        if store is None:
+            def _noop(*args, **kwargs):
+                return None
+            return _noop
+        return getattr(store, name)
+_responses_store = _LazyResponseStore()

Conversation

eloe commented Apr 28, 2026

Summary

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants