fix(mcp): avoid relisting tools on resumed responses by zhaowenzi · Pull Request #1060 · lightseekorg/smg

zhaowenzi · 2026-04-08T01:28:27Z

Description

Problem

When /v1/responses resumed a prior turn with previous_response_id, SMG re-emitted mcp_list_tools for MCP bindings that were already established in earlier turns.

That differs from OpenAI behavior. On follow-up turns, OpenAI does not repeat mcp_list_tools for existing bindings. It only emits a new mcp_list_tools item when the request adds a new MCP binding.

This PR intentionally scopes the fix to the OpenAI Responses router path first. The gRPC Responses paths will be handled in a follow-up PR.

Solution

SMG now reads prior response history for mcp_list_tools.server_label values and treats those bindings as already known.

On resumed turns:

existing MCP bindings do not get a repeated mcp_list_tools
newly added MCP bindings get exactly one new mcp_list_tools
normal mcp_call behavior is unchanged for the current turn

This first PR only applies that behavior to the OpenAI Responses router path. A follow-up PR will align the gRPC Responses paths.

Changes

load existing MCP binding labels from stored response history referenced by previous_response_id
pass existing binding labels through Responses routing, non-streaming, and streaming paths
emit mcp_list_tools only for bindings not already seen in prior response output
add regression coverage for resumed MCP flows in router/unit tests and e2e tests
scope this PR to the OpenAI Responses router path only
leave gRPC Responses alignment for a follow-up PR

Test Plan

Before This Change

First request:

curl "$BASE_URL" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "input": "Search the web for Python programming language.",
    "tools": [
      {
        "type": "mcp",
        "server_label": "brave",
        "server_url": "http://127.0.0.1:8080/mcp",
        "require_approval": "never",
        "allowed_tools": ["brave_web_search"]
      }
    ]
  }'

Example response shape:

{
  "id": "resp_1",
  "output": [
    {
      "type": "mcp_list_tools",
      "server_label": "brave"
    },
    {
      "type": "mcp_call",
      "server_label": "brave"
    },
    {
      "type": "message"
    }
  ]
}

Second request with the same MCP binding and previous_response_id:

curl "$BASE_URL" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "input": "Search the web for Rust programming language.",
    "previous_response_id": "resp_1",
    "tools": [
      {
        "type": "mcp",
        "server_label": "brave",
        "server_url": "http://127.0.0.1:8080/mcp",
        "require_approval": "never",
        "allowed_tools": ["brave_web_search"]
      }
    ]
  }'

Problematic response shape before this change:

{
  "id": "resp_2",
  "output": [
    {
      "type": "mcp_list_tools",
      "server_label": "brave"
    },
    {
      "type": "mcp_call",
      "server_label": "brave"
    },
    {
      "type": "message"
    }
  ]
}

Why this is wrong:

brave was already listed in the first turn
resumed turn should not emit another mcp_list_tools for the same binding

Third request adds a new MCP binding:

curl "$BASE_URL" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "input": "Use deepwiki and brave to answer.",
    "previous_response_id": "resp_2",
    "tools": [
      {
        "type": "mcp",
        "server_label": "brave",
        "server_url": "http://127.0.0.1:8080/mcp",
        "require_approval": "never",
        "allowed_tools": ["brave_web_search"]
      },
      {
        "type": "mcp",
        "server_label": "deepwiki",
        "server_url": "https://mcp.deepwiki.com/mcp",
        "require_approval": "never"
      }
    ]
  }'

Problematic response shape before this change:

{
  "id": "resp_3",
  "output": [
    {
      "type": "mcp_list_tools",
      "server_label": "brave"
    },
    {
      "type": "mcp_list_tools",
      "server_label": "deepwiki"
    },
    {
      "type": "mcp_call",
      "server_label": "brave"
    },
    {
      "type": "mcp_call",
      "server_label": "deepwiki"
    },
    {
      "type": "message"
    }
  ]
}

Why this is wrong:

only deepwiki is new
brave should not be relisted again

After This Change

First request stays the same:

{
  "id": "resp_1",
  "output": [
    {
      "type": "mcp_list_tools",
      "server_label": "brave"
    },
    {
      "type": "mcp_call",
      "server_label": "brave"
    },
    {
      "type": "message"
    }
  ]
}

Second request with the same binding now returns:

{
  "id": "resp_2",
  "output": [
    {
      "type": "mcp_call",
      "server_label": "brave"
    },
    {
      "type": "message"
    }
  ]
}

Key change:

no repeated mcp_list_tools for brave

Third request with a newly added binding now returns:

{
  "id": "resp_3",
  "output": [
    {
      "type": "mcp_list_tools",
      "server_label": "deepwiki"
    },
    {
      "type": "mcp_call",
      "server_label": "brave"
    },
    {
      "type": "mcp_call",
      "server_label": "deepwiki"
    },
    {
      "type": "message"
    }
  ]
}

Key change:

only the newly added deepwiki binding gets a new mcp_list_tools
existing brave binding is not relisted

Streaming Check

The same rule also applies to SSE event flow:

before: resumed streams could emit response.mcp_list_tools.in_progress and response.mcp_list_tools.completed again for existing bindings
after: resumed streams emit those events only for newly added bindings

Checklist

cargo +nightly fmt passes
cargo clippy --all-targets --all-features -- -D warnings passes
(Optional) Documentation updated
(Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

New Features
- Improved resumption: responses resumed with a previous_response_id no longer re-send already-seen tool listings.
Bug Fixes
- Resumed streaming and non-streaming responses now avoid duplicate tool-list emissions while still invoking needed tool calls.
Tests
- Added end-to-end tests covering resumed behavior for both streaming and non-streaming modes and an integration test validating tool-list suppression.

coderabbitai · 2026-04-08T01:28:34Z

📝 Walkthrough

Walkthrough

Tracks MCP mcp_list_tools labels from prior responses, threads them through request routing and the tool loop, and emits only newly-introduced mcp_list_tools bindings when resuming via previous_response_id in both streaming and non-streaming flows.

Changes

Cohort / File(s)	Summary
E2E Tests `e2e_test/responses/test_tools_call.py`	Added helpers to extract `server_label` and detect streamed `response.output_item.added` MCP list-tools; added two tests asserting `previous_response_id` binding behavior for streaming and non-streaming.
Payload / Context `model_gateway/src/routers/openai/chat.rs`, `model_gateway/src/routers/openai/context.rs`	Added `existing_mcp_list_tools_labels: Vec<String>` to `PayloadState` and `OwnedStreamingContext`, and initialized it in request routing.
MCP Public Re-exports `model_gateway/src/routers/openai/mcp/mod.rs`	Re-exported `mcp_list_tools_bindings_to_emit` and `ToolLoopExecutionContext`.
MCP Tool Loop `model_gateway/src/routers/openai/mcp/tool_loop.rs`	Track prior MCP list-tools labels in `ToolLoopState`; accept prior labels in constructor; add `mcp_list_tools_bindings_to_emit` to compute only-new bindings; refactor `execute_tool_loop` to take `ToolLoopExecutionContext`; update streaming/incomplete emission to include only new bindings; added unit tests.
History Loading `model_gateway/src/routers/openai/responses/history.rs`	`load_input_history` now returns `LoadedInputHistory { previous_response_id, existing_mcp_list_tools_labels }`; added extraction of `mcp_list_tools` `server_label`s from prior responses.
Response Routing / Handlers `model_gateway/src/routers/openai/responses/route.rs`, `model_gateway/src/routers/openai/responses/non_streaming.rs`, `model_gateway/src/routers/openai/responses/streaming.rs`	Thread `existing_mcp_list_tools_labels` from loaded history into payload/context and into `ToolLoopState`; precompute filtered `list_tools_bindings` in streaming path and emit only new bindings.
Integration Tests & Mocks `model_gateway/tests/api/responses_api_test.rs`, `model_gateway/tests/common/mock_worker.rs`	Added integration test ensuring resumed non-streaming responses do not repeat existing `mcp_list_tools`; adjusted mock worker to use `has_prior_tool_context` for resume-turn logic.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Router as Router<br/>(responses/route.rs)
    participant History as History Loader<br/>(responses/history.rs)
    participant ToolLoop as Tool Loop<br/>(mcp/tool_loop.rs)
    participant MCP as MCP Session

    Client->>Router: ResponsesRequest (first turn)
    Router->>History: load_input_history()
    History-->>Router: LoadedInputHistory { previous_response_id: None, existing_labels: [] }
    Router->>ToolLoop: execute_tool_loop(ctx { existing_labels: [] })
    ToolLoop->>ToolLoop: mcp_list_tools_bindings_to_emit([], all_bindings)
    ToolLoop->>MCP: emit all MCP list-tools
    ToolLoop-->>Router: response with mcp_list_tools + mcp_call

    Client->>Router: ResponsesRequest (resume with previous_response_id)
    Router->>History: load_input_history(previous_response_id)
    History->>History: extract_mcp_list_tools_labels(prior_output)
    History-->>Router: LoadedInputHistory { previous_response_id: Some(id), existing_labels: ["binding1"] }
    Router->>ToolLoop: execute_tool_loop(ctx { existing_labels: ["binding1"] })
    ToolLoop->>ToolLoop: mcp_list_tools_bindings_to_emit([binding1], all_bindings)
    ToolLoop->>MCP: emit only new MCP list-tools
    ToolLoop-->>Router: response with only mcp_call (no repeated mcp_list_tools)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

refactor(gateway): extract history loading and storage queries from router.rs #732 — Extends load_input_history; this PR builds on that history-loading functionality by returning and propagating prior MCP labels.
refactor(mcp): simplify ensure_request_mcp_client and remove McpLoopConfig #368 — Refactors MCP tool-loop interfaces and invocation patterns that overlap with the execute_tool_loop changes here.
refactor(gateway): extract MCP module from OpenAI responses #730 — Related MCP module changes and re-exports that this PR further adapts for selective binding emission.

Suggested reviewers

slin1237
CatherineSue
key4ng

Poem

🐰 I hopped through responses, took care to inspect,

old labels remembered, new ones collect.
No duplicate listings, resume stays neat,
Tool-loop trimmed tidy—what a gentle feat! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 78.13% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main change: preventing redundant MCP tool listing when resuming responses via previous_response_id, which is the primary objective of this PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/openai-mcp-resume-list-tools-alignment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a mechanism to prevent redundant MCP tool listings when resuming conversations via previous_response_id. It tracks previously emitted MCP server labels throughout the request lifecycle and filters them from subsequent outputs in both streaming and non-streaming modes. The feedback identifies several improvement opportunities: replacing fixed time.sleep calls in tests with more robust polling to reduce flakiness, addressing fragile list indexing and logic duplication in the Python E2E tests, and optimizing the Rust implementation of existing_mcp_list_tools_labels to avoid inefficient JSON serialization.

e2e_test/responses/test_tools_call.py

model_gateway/src/routers/openai/mcp/tool_loop.rs

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 66ecd5c9ed

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

model_gateway/src/routers/openai/mcp/tool_loop.rs

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

model_gateway/src/routers/openai/responses/history.rs (1)

44-60: ⚠️ Potential issue | 🟠 Major

MCP items are silently dropped during conversation loading, causing duplicate list-tools emissions on resume.

Conversation persistence stores mcp_list_tools and mcp_call items in conversation storage (see item_to_new_conversation_item), but the conversation loader here only reconstructs message, function_call, and function_call_output items. All other types—including mcp_list_tools and mcp_call—fall into the _ pattern and are discarded with a warning.

As a result, existing_mcp_list_tools_labels is only seeded from get_response_chain(previous_response_id). When a conversation-backed request resumes, the tool loop receives an empty label set relative to the conversation history, causing mcp_list_tools_bindings_to_emit to re-emit bindings for servers that were already listed in the conversation's output, leading to duplicates.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@model_gateway/tests/common/mock_worker.rs`:
- Around line 669-688: The current guard around emitting mock tool calls uses
has_prior_tool_context which is true if previous_response_id or any mcp_* resume
metadata exists, causing resumed requests to short-circuit to final assistant
messages; update the logic so that only concrete prior tool results count:
remove previous_response_id from the has_prior_tool_context boolean and instead
require has_function_output || has_mcp_history (or, if previous_response_id is
present, look up stored mock history by that id to detect an actual prior tool
result before treating it as prior context). Apply this same fix to the other
occurrences noted (the checks around has_previous_response_id/has_mcp_history at
the other locations).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cb2b06d9-8d9c-4283-8063-7853aed7a16e

📥 Commits

Reviewing files that changed from the base of the PR and between 9516b44 and 66ecd5c.

📒 Files selected for processing (11)

e2e_test/responses/test_tools_call.py
model_gateway/src/routers/openai/chat.rs
model_gateway/src/routers/openai/context.rs
model_gateway/src/routers/openai/mcp/mod.rs
model_gateway/src/routers/openai/mcp/tool_loop.rs
model_gateway/src/routers/openai/responses/history.rs
model_gateway/src/routers/openai/responses/non_streaming.rs
model_gateway/src/routers/openai/responses/route.rs
model_gateway/src/routers/openai/responses/streaming.rs
model_gateway/tests/api/responses_api_test.rs
model_gateway/tests/common/mock_worker.rs

model_gateway/tests/common/mock_worker.rs

Signed-off-by: Ziwen Zhao <zzw.mose@gmail.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@e2e_test/responses/test_tools_call.py`:
- Around line 277-279: Extract a small helper function named
get_completed_response_id(events) that finds events with type
"response.completed", asserts exactly one (or raises a clear assertion message)
and returns its .response.id, then replace the inline list-comprehensions that
index [0] (e.g., where previous_response_id is set from events1 and the
analogous use with events2) with calls to get_completed_response_id(events1) and
get_completed_response_id(events2) so failures produce a clear assertion message
and the test is less brittle.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 523c1a25-0943-444c-8759-44d07b1f7efc

📥 Commits

Reviewing files that changed from the base of the PR and between 66ecd5c and 627547e.

📒 Files selected for processing (11)

e2e_test/responses/test_tools_call.py
model_gateway/src/routers/openai/chat.rs
model_gateway/src/routers/openai/context.rs
model_gateway/src/routers/openai/mcp/mod.rs
model_gateway/src/routers/openai/mcp/tool_loop.rs
model_gateway/src/routers/openai/responses/history.rs
model_gateway/src/routers/openai/responses/non_streaming.rs
model_gateway/src/routers/openai/responses/route.rs
model_gateway/src/routers/openai/responses/streaming.rs
model_gateway/tests/api/responses_api_test.rs
model_gateway/tests/common/mock_worker.rs

e2e_test/responses/test_tools_call.py

github-actions bot added tests Test changes model-gateway Model gateway crate changes openai OpenAI router changes labels Apr 8, 2026

claude bot approved these changes Apr 8, 2026

View reviewed changes

gemini-code-assist bot reviewed Apr 8, 2026

View reviewed changes

e2e_test/responses/test_tools_call.py Show resolved Hide resolved

e2e_test/responses/test_tools_call.py Show resolved Hide resolved

model_gateway/src/routers/openai/mcp/tool_loop.rs Show resolved Hide resolved

coderabbitai bot approved these changes Apr 8, 2026

View reviewed changes

zhaowenzi force-pushed the fix/openai-mcp-resume-list-tools-alignment branch from 49b6d6f to 66ecd5c Compare April 8, 2026 19:47

zhaowenzi marked this pull request as ready for review April 8, 2026 19:48

zhaowenzi requested review from CatherineSue, XinyueZhang369, key4ng and slin1237 as code owners April 8, 2026 19:48

chatgpt-codex-connector bot reviewed Apr 8, 2026

View reviewed changes

model_gateway/src/routers/openai/mcp/tool_loop.rs Show resolved Hide resolved

coderabbitai bot requested changes Apr 8, 2026

View reviewed changes

model_gateway/tests/common/mock_worker.rs Outdated Show resolved Hide resolved

zhaowenzi added 3 commits April 8, 2026 13:10

fix(mcp): avoid relisting tools on resumed responses

e8e75da

Signed-off-by: Ziwen Zhao <zzw.mose@gmail.com>

format fix

e083329

Signed-off-by: Ziwen Zhao <zzw.mose@gmail.com>

comment fix

627547e

Signed-off-by: Ziwen Zhao <zzw.mose@gmail.com>

zhaowenzi force-pushed the fix/openai-mcp-resume-list-tools-alignment branch from 66ecd5c to 627547e Compare April 8, 2026 20:10

coderabbitai bot requested changes Apr 8, 2026

View reviewed changes

e2e_test/responses/test_tools_call.py Show resolved Hide resolved

coderabbitai bot approved these changes Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mcp): avoid relisting tools on resumed responses#1060

fix(mcp): avoid relisting tools on resumed responses#1060
zhaowenzi wants to merge 3 commits intomainfrom
fix/openai-mcp-resume-list-tools-alignment

zhaowenzi commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 8, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhaowenzi commented Apr 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Test Plan

Before This Change

After This Change

Streaming Check

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhaowenzi commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 8, 2026 •

edited

Loading