Skip to content

fix(mcp): avoid relisting tools on resumed responses#1060

Open
zhaowenzi wants to merge 3 commits intomainfrom
fix/openai-mcp-resume-list-tools-alignment
Open

fix(mcp): avoid relisting tools on resumed responses#1060
zhaowenzi wants to merge 3 commits intomainfrom
fix/openai-mcp-resume-list-tools-alignment

Conversation

@zhaowenzi
Copy link
Copy Markdown
Collaborator

@zhaowenzi zhaowenzi commented Apr 8, 2026

Description

Problem

When /v1/responses resumed a prior turn with previous_response_id, SMG re-emitted mcp_list_tools for MCP bindings that were already established in earlier turns.

That differs from OpenAI behavior. On follow-up turns, OpenAI does not repeat mcp_list_tools for existing bindings. It only emits a new mcp_list_tools item when the request adds a new MCP binding.

This PR intentionally scopes the fix to the OpenAI Responses router path first. The gRPC Responses paths will be handled in a follow-up PR.

Solution

SMG now reads prior response history for mcp_list_tools.server_label values and treats those bindings as already known.

On resumed turns:

  • existing MCP bindings do not get a repeated mcp_list_tools
  • newly added MCP bindings get exactly one new mcp_list_tools
  • normal mcp_call behavior is unchanged for the current turn

This first PR only applies that behavior to the OpenAI Responses router path. A follow-up PR will align the gRPC Responses paths.

Changes

  • load existing MCP binding labels from stored response history referenced by previous_response_id
  • pass existing binding labels through Responses routing, non-streaming, and streaming paths
  • emit mcp_list_tools only for bindings not already seen in prior response output
  • add regression coverage for resumed MCP flows in router/unit tests and e2e tests
  • scope this PR to the OpenAI Responses router path only
  • leave gRPC Responses alignment for a follow-up PR

Test Plan

Before This Change

First request:

curl "$BASE_URL" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "input": "Search the web for Python programming language.",
    "tools": [
      {
        "type": "mcp",
        "server_label": "brave",
        "server_url": "http://127.0.0.1:8080/mcp",
        "require_approval": "never",
        "allowed_tools": ["brave_web_search"]
      }
    ]
  }'

Example response shape:

{
  "id": "resp_1",
  "output": [
    {
      "type": "mcp_list_tools",
      "server_label": "brave"
    },
    {
      "type": "mcp_call",
      "server_label": "brave"
    },
    {
      "type": "message"
    }
  ]
}

Second request with the same MCP binding and previous_response_id:

curl "$BASE_URL" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "input": "Search the web for Rust programming language.",
    "previous_response_id": "resp_1",
    "tools": [
      {
        "type": "mcp",
        "server_label": "brave",
        "server_url": "http://127.0.0.1:8080/mcp",
        "require_approval": "never",
        "allowed_tools": ["brave_web_search"]
      }
    ]
  }'

Problematic response shape before this change:

{
  "id": "resp_2",
  "output": [
    {
      "type": "mcp_list_tools",
      "server_label": "brave"
    },
    {
      "type": "mcp_call",
      "server_label": "brave"
    },
    {
      "type": "message"
    }
  ]
}

Why this is wrong:

  • brave was already listed in the first turn
  • resumed turn should not emit another mcp_list_tools for the same binding

Third request adds a new MCP binding:

curl "$BASE_URL" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "input": "Use deepwiki and brave to answer.",
    "previous_response_id": "resp_2",
    "tools": [
      {
        "type": "mcp",
        "server_label": "brave",
        "server_url": "http://127.0.0.1:8080/mcp",
        "require_approval": "never",
        "allowed_tools": ["brave_web_search"]
      },
      {
        "type": "mcp",
        "server_label": "deepwiki",
        "server_url": "https://mcp.deepwiki.com/mcp",
        "require_approval": "never"
      }
    ]
  }'

Problematic response shape before this change:

{
  "id": "resp_3",
  "output": [
    {
      "type": "mcp_list_tools",
      "server_label": "brave"
    },
    {
      "type": "mcp_list_tools",
      "server_label": "deepwiki"
    },
    {
      "type": "mcp_call",
      "server_label": "brave"
    },
    {
      "type": "mcp_call",
      "server_label": "deepwiki"
    },
    {
      "type": "message"
    }
  ]
}

Why this is wrong:

  • only deepwiki is new
  • brave should not be relisted again

After This Change

First request stays the same:

{
  "id": "resp_1",
  "output": [
    {
      "type": "mcp_list_tools",
      "server_label": "brave"
    },
    {
      "type": "mcp_call",
      "server_label": "brave"
    },
    {
      "type": "message"
    }
  ]
}

Second request with the same binding now returns:

{
  "id": "resp_2",
  "output": [
    {
      "type": "mcp_call",
      "server_label": "brave"
    },
    {
      "type": "message"
    }
  ]
}

Key change:

  • no repeated mcp_list_tools for brave

Third request with a newly added binding now returns:

{
  "id": "resp_3",
  "output": [
    {
      "type": "mcp_list_tools",
      "server_label": "deepwiki"
    },
    {
      "type": "mcp_call",
      "server_label": "brave"
    },
    {
      "type": "mcp_call",
      "server_label": "deepwiki"
    },
    {
      "type": "message"
    }
  ]
}

Key change:

  • only the newly added deepwiki binding gets a new mcp_list_tools
  • existing brave binding is not relisted

Streaming Check

The same rule also applies to SSE event flow:

  • before: resumed streams could emit response.mcp_list_tools.in_progress and response.mcp_list_tools.completed again for existing bindings
  • after: resumed streams emit those events only for newly added bindings
Checklist
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

  • New Features

    • Improved resumption: responses resumed with a previous_response_id no longer re-send already-seen tool listings.
  • Bug Fixes

    • Resumed streaming and non-streaming responses now avoid duplicate tool-list emissions while still invoking needed tool calls.
  • Tests

    • Added end-to-end tests covering resumed behavior for both streaming and non-streaming modes and an integration test validating tool-list suppression.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

Tracks MCP mcp_list_tools labels from prior responses, threads them through request routing and the tool loop, and emits only newly-introduced mcp_list_tools bindings when resuming via previous_response_id in both streaming and non-streaming flows.

Changes

Cohort / File(s) Summary
E2E Tests
e2e_test/responses/test_tools_call.py
Added helpers to extract server_label and detect streamed response.output_item.added MCP list-tools; added two tests asserting previous_response_id binding behavior for streaming and non-streaming.
Payload / Context
model_gateway/src/routers/openai/chat.rs, model_gateway/src/routers/openai/context.rs
Added existing_mcp_list_tools_labels: Vec<String> to PayloadState and OwnedStreamingContext, and initialized it in request routing.
MCP Public Re-exports
model_gateway/src/routers/openai/mcp/mod.rs
Re-exported mcp_list_tools_bindings_to_emit and ToolLoopExecutionContext.
MCP Tool Loop
model_gateway/src/routers/openai/mcp/tool_loop.rs
Track prior MCP list-tools labels in ToolLoopState; accept prior labels in constructor; add mcp_list_tools_bindings_to_emit to compute only-new bindings; refactor execute_tool_loop to take ToolLoopExecutionContext; update streaming/incomplete emission to include only new bindings; added unit tests.
History Loading
model_gateway/src/routers/openai/responses/history.rs
load_input_history now returns LoadedInputHistory { previous_response_id, existing_mcp_list_tools_labels }; added extraction of mcp_list_tools server_labels from prior responses.
Response Routing / Handlers
model_gateway/src/routers/openai/responses/route.rs, model_gateway/src/routers/openai/responses/non_streaming.rs, model_gateway/src/routers/openai/responses/streaming.rs
Thread existing_mcp_list_tools_labels from loaded history into payload/context and into ToolLoopState; precompute filtered list_tools_bindings in streaming path and emit only new bindings.
Integration Tests & Mocks
model_gateway/tests/api/responses_api_test.rs, model_gateway/tests/common/mock_worker.rs
Added integration test ensuring resumed non-streaming responses do not repeat existing mcp_list_tools; adjusted mock worker to use has_prior_tool_context for resume-turn logic.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Router as Router<br/>(responses/route.rs)
    participant History as History Loader<br/>(responses/history.rs)
    participant ToolLoop as Tool Loop<br/>(mcp/tool_loop.rs)
    participant MCP as MCP Session

    Client->>Router: ResponsesRequest (first turn)
    Router->>History: load_input_history()
    History-->>Router: LoadedInputHistory { previous_response_id: None, existing_labels: [] }
    Router->>ToolLoop: execute_tool_loop(ctx { existing_labels: [] })
    ToolLoop->>ToolLoop: mcp_list_tools_bindings_to_emit([], all_bindings)
    ToolLoop->>MCP: emit all MCP list-tools
    ToolLoop-->>Router: response with mcp_list_tools + mcp_call

    Client->>Router: ResponsesRequest (resume with previous_response_id)
    Router->>History: load_input_history(previous_response_id)
    History->>History: extract_mcp_list_tools_labels(prior_output)
    History-->>Router: LoadedInputHistory { previous_response_id: Some(id), existing_labels: ["binding1"] }
    Router->>ToolLoop: execute_tool_loop(ctx { existing_labels: ["binding1"] })
    ToolLoop->>ToolLoop: mcp_list_tools_bindings_to_emit([binding1], all_bindings)
    ToolLoop->>MCP: emit only new MCP list-tools
    ToolLoop-->>Router: response with only mcp_call (no repeated mcp_list_tools)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • slin1237
  • CatherineSue
  • key4ng

Poem

🐰 I hopped through responses, took care to inspect,

old labels remembered, new ones collect.
No duplicate listings, resume stays neat,
Tool-loop trimmed tidy—what a gentle feat! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.13% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: preventing redundant MCP tool listing when resuming responses via previous_response_id, which is the primary objective of this PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/openai-mcp-resume-list-tools-alignment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added tests Test changes model-gateway Model gateway crate changes openai OpenAI router changes labels Apr 8, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to prevent redundant MCP tool listings when resuming conversations via previous_response_id. It tracks previously emitted MCP server labels throughout the request lifecycle and filters them from subsequent outputs in both streaming and non-streaming modes. The feedback identifies several improvement opportunities: replacing fixed time.sleep calls in tests with more robust polling to reduce flakiness, addressing fragile list indexing and logic duplication in the Python E2E tests, and optimizing the Rust implementation of existing_mcp_list_tools_labels to avoid inefficient JSON serialization.

@zhaowenzi zhaowenzi force-pushed the fix/openai-mcp-resume-list-tools-alignment branch from 49b6d6f to 66ecd5c Compare April 8, 2026 19:47
@zhaowenzi zhaowenzi marked this pull request as ready for review April 8, 2026 19:48
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 66ecd5c9ed

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
model_gateway/src/routers/openai/responses/history.rs (1)

44-60: ⚠️ Potential issue | 🟠 Major

MCP items are silently dropped during conversation loading, causing duplicate list-tools emissions on resume.

Conversation persistence stores mcp_list_tools and mcp_call items in conversation storage (see item_to_new_conversation_item), but the conversation loader here only reconstructs message, function_call, and function_call_output items. All other types—including mcp_list_tools and mcp_call—fall into the _ pattern and are discarded with a warning.

As a result, existing_mcp_list_tools_labels is only seeded from get_response_chain(previous_response_id). When a conversation-backed request resumes, the tool loop receives an empty label set relative to the conversation history, causing mcp_list_tools_bindings_to_emit to re-emit bindings for servers that were already listed in the conversation's output, leading to duplicates.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@model_gateway/tests/common/mock_worker.rs`:
- Around line 669-688: The current guard around emitting mock tool calls uses
has_prior_tool_context which is true if previous_response_id or any mcp_* resume
metadata exists, causing resumed requests to short-circuit to final assistant
messages; update the logic so that only concrete prior tool results count:
remove previous_response_id from the has_prior_tool_context boolean and instead
require has_function_output || has_mcp_history (or, if previous_response_id is
present, look up stored mock history by that id to detect an actual prior tool
result before treating it as prior context). Apply this same fix to the other
occurrences noted (the checks around has_previous_response_id/has_mcp_history at
the other locations).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cb2b06d9-8d9c-4283-8063-7853aed7a16e

📥 Commits

Reviewing files that changed from the base of the PR and between 9516b44 and 66ecd5c.

📒 Files selected for processing (11)
  • e2e_test/responses/test_tools_call.py
  • model_gateway/src/routers/openai/chat.rs
  • model_gateway/src/routers/openai/context.rs
  • model_gateway/src/routers/openai/mcp/mod.rs
  • model_gateway/src/routers/openai/mcp/tool_loop.rs
  • model_gateway/src/routers/openai/responses/history.rs
  • model_gateway/src/routers/openai/responses/non_streaming.rs
  • model_gateway/src/routers/openai/responses/route.rs
  • model_gateway/src/routers/openai/responses/streaming.rs
  • model_gateway/tests/api/responses_api_test.rs
  • model_gateway/tests/common/mock_worker.rs

Signed-off-by: Ziwen Zhao <zzw.mose@gmail.com>
Signed-off-by: Ziwen Zhao <zzw.mose@gmail.com>
Signed-off-by: Ziwen Zhao <zzw.mose@gmail.com>
@zhaowenzi zhaowenzi force-pushed the fix/openai-mcp-resume-list-tools-alignment branch from 66ecd5c to 627547e Compare April 8, 2026 20:10
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@e2e_test/responses/test_tools_call.py`:
- Around line 277-279: Extract a small helper function named
get_completed_response_id(events) that finds events with type
"response.completed", asserts exactly one (or raises a clear assertion message)
and returns its .response.id, then replace the inline list-comprehensions that
index [0] (e.g., where previous_response_id is set from events1 and the
analogous use with events2) with calls to get_completed_response_id(events1) and
get_completed_response_id(events2) so failures produce a clear assertion message
and the test is less brittle.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 523c1a25-0943-444c-8759-44d07b1f7efc

📥 Commits

Reviewing files that changed from the base of the PR and between 66ecd5c and 627547e.

📒 Files selected for processing (11)
  • e2e_test/responses/test_tools_call.py
  • model_gateway/src/routers/openai/chat.rs
  • model_gateway/src/routers/openai/context.rs
  • model_gateway/src/routers/openai/mcp/mod.rs
  • model_gateway/src/routers/openai/mcp/tool_loop.rs
  • model_gateway/src/routers/openai/responses/history.rs
  • model_gateway/src/routers/openai/responses/non_streaming.rs
  • model_gateway/src/routers/openai/responses/route.rs
  • model_gateway/src/routers/openai/responses/streaming.rs
  • model_gateway/tests/api/responses_api_test.rs
  • model_gateway/tests/common/mock_worker.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model-gateway Model gateway crate changes openai OpenAI router changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant