Skip to content

feat(image): add image generation tool support#1057

Open
TingtingZhou7 wants to merge 4 commits intolightseekorg:mainfrom
TingtingZhou7:tingzhou/add-img-tool-rebase-2
Open

feat(image): add image generation tool support#1057
TingtingZhou7 wants to merge 4 commits intolightseekorg:mainfrom
TingtingZhou7:tingzhou/add-img-tool-rebase-2

Conversation

@TingtingZhou7
Copy link
Copy Markdown

@TingtingZhou7 TingtingZhou7 commented Apr 7, 2026

Description

Support image generation tool mcp server from smg for openSource customers.

Problem

No image generation tool support today via mcp server.

Solution

Add support to:

Two layer of ImageGenerationTool is supported in this PR:

  • From smg->openAI directly
  • From scm-> remote mcp server -> openAI
  • The reponse api format will follow openAI format: https://developers.openai.com/api/docs/guides/image-generation
  • This PR is to support non-streaming case mainly. I have noticed some response format not aligned with existing dp for streaming case, will fix in follow up prs.

Changes not included in this PR

  • Support passing compartmentId to mcp server
  • Thourough testing of streaming case

Test Plan

Test Case 1: Call to Remote MCP Server to generate image

Step1: I set up the OCI Image Tool Generation tool MCP Server locally at port 8080. And run the cargo command as below with --mcp-config-path /Users/tingzhou/workspace/genAI/smg/mcp.local.yaml added. This will allow smg to read the mcp server information dynamically:


tingzhou@tingzhou-mac smg % cargo run --bin smg -- \
  --enable-igw \
  --port 9999 \
  --health-check-endpoint /v1/models \
  --history-backend oracle \
  --oracle-dsn "tcps://adb.us-sanjose-1.oraclecloud.com:1522/g9fe6ef5b169add_ze17g3phrlouc3qs_high.adb.oraclecloud.com?ssl_server_dn_match=true" \
  --oracle-user "ADMIN" \
  --oracle-password "ppppwddd@" \
  --log-level debug --mcp-config-path /Users/tingzhou/workspace/genAI/smg/mcp.local.yaml --schema-config /Users/tingzhou/workspace/genAI/smg/scripts/oracle_flyway/schema-config.yaml

which include the tool and mcp server information. Here is the example of the mcp.local.yaml file:
Screenshot 2026-03-31 at 4 14 19 PM

Test with streaming disabled

Request

curl -sS "http://127.0.0.1:9999/v1/responses" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -H "opc-request-id: $(uuidgen)" \
    -H "opc-compartment-id: ocid1.compartment.oc1..aaaaaaaathgntpo75bdehisnl6wkxfc4slkd6rpheafbt5a6ekm2ri4bmeva" \
    -H "Content-Type: application/json" \
    -d '{
      "model":"gpt-5.1",
      "input":"Generate an image of a red panda reading a book under a maple tree, watercolor style.",
      "tools":[{"type":"image_generation"}],
      "stream":false
    }'

Success log from local mcp server:
Screenshot 2026-03-31 at 9 27 18 AM

Image attached:

red_panda_mcp

Respons json:

response-new1.json

Test with streaming enabled
curl -sS "http://127.0.0.1:9999/v1/responses" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -H "opc-request-id: $(uuidgen)" \
    -H "opc-compartment-id: ocid1.compartment.oc1..aaaaaaaathgntpo75bdehisnl6wkxfc4slkd6rpheafbt5a6ekm2ri4bmeva" \
    -H "Content-Type: application/json" \
    -d '{
      "model":"gpt-5.1",
      "input":"Generate an image of a red panda reading a book under a maple tree, watercolor style.",
      "tools":[{"type":"image_generation"}],
      "stream":true
    }'

⚠️The response format does not align with dp. Will fix in a seperate PR.
response-new-stream.json

Test Case 2

Direct call to OpenAI to generate image

Step1: I run this command to start the smg locally without pass the mcp server configuration:


tingzhou@tingzhou-mac smg % cargo run --bin smg -- \
  --enable-igw \
  --port 9999 \
  --health-check-endpoint /v1/models \
  --history-backend oracle \
  --oracle-dsn "tcps://adb.us-sanjose-1.oraclecloud.com:1522/g9fe6ef5b169add_ze17g3phrlouc3qs_high.adb.oraclecloud.com?ssl_server_dn_match=true" \
  --oracle-user "ADMIN" \
  --oracle-password "ppppwddd@" \
  --log-level debug

Step1: Run image generation tool end-to-end
API call:

curl -sS "http://127.0.0.1:9999/v1/responses" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -H "opc-request-id: $(uuidgen)" \
    -H "Content-Type: application/json" \
    -d '{
      "model":"gpt-5.1",
      "input":"Generate an image of a red panda reading a book under a maple tree, watercolor style.",
      "tools":[{"type":"image_generation"}],
      "stream":false
    }' | tee openai-response.txt

And here is the output:

openai-response.txt

Checklist
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

  • New Features
    • Added built‑in image generation tooling with configurable image‑generation outputs and lifecycle statuses (in-progress → generating → completed/failed).
    • Streaming now emits image‑generation progress events and stable image‑call IDs.
    • Responses include a new image‑generation call output with optional metadata (result, prompt, background, format, quality, size, action).
    • Tool outputs are compacted for model context (binary payloads omitted; errors surfaced).
    • Image‑generation can be routed via MCP server/tool configuration and supports request-time argument overrides.
    • Added local MCP example config and ignored it in version control.

@github-actions github-actions bot added grpc gRPC client and router changes mcp MCP related changes protocols Protocols crate changes model-gateway Model gateway crate changes openai OpenAI router changes labels Apr 7, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds ImageGeneration built-in tool and ImageGenerationCall response format, wiring enums/serde, transformer conversion, MCP routing, streaming events/IDs, tool-loop argument handling, and model-context output compaction across gateway and protocol crates.

Changes

Cohort / File(s) Summary
Core types & protocols
crates/mcp/src/core/config.rs, crates/mcp/src/transform/types.rs, crates/protocols/src/event_types.rs, crates/protocols/src/responses.rs
Introduce ImageGeneration / ImageGenerationCall variants, ImageGenerationTool, ImageGenerationCallStatus, events, serde mappings, Display/as_str updates, and tests.
Transformation
crates/mcp/src/transform/transformer.rs
Add ResponseFormat::ImageGenerationCall handling, payload extraction (direct/wrapped/string JSON), to_image_generation_call conversion producing ResponseOutputItem::ImageGenerationCall, change many signatures to use Value, and add unit tests.
MCP config & local example
mcp.local.yaml, .gitignore
Add local MCP config example registering image_generation server/tool and ignore entry for mcp.local.yaml.
MCP routing & utils
model_gateway/src/routers/mcp_utils.rs, model_gateway/src/routers/grpc/common/responses/utils.rs
Recognize/route ResponseTool::ImageGeneration(_)BuiltinToolType::ImageGeneration; include image-generation in collect/extract and ensure flows; update tests/fixtures.
Gateway builders & mappings
model_gateway/src/routers/grpc/harmony/builder.rs
Add "image_generation" to builtin lists and mappings; treat image-generation as builtin.
Streaming & events
model_gateway/src/routers/grpc/common/responses/streaming.rs, model_gateway/src/routers/openai/mcp/tool_loop.rs, model_gateway/src/routers/openai/responses/streaming.rs
Add OutputItemType::ImageGenerationCall, map event statuses (IN_PROGRESS/GENERATING/COMPLETED), use ig_ ID prefix, pass original request for overrides, and update streaming emission calls.
Model-context output & recording
model_gateway/src/routers/mod.rs, model_gateway/src/routers/tool_output_context.rs, model_gateway/src/routers/grpc/regular/responses/common.rs, model_gateway/src/routers/openai/responses/utils.rs
Add tool_output_context module and compact_tool_output_for_model_context; compact image-gen outputs for history/storage; map output items to ResponseFormat for recording and serialize ResponseTool::ImageGeneration.
Tests & fixtures
model_gateway/src/routers/mcp_utils.rs tests and fixtures, various unit tests across crates
Add and update tests covering routing, transformer behavior, streaming events, compaction, and serde round-trips.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ModelGateway
    participant MCP_Server as MCP (generate_image)
    participant Transformer
    Client->>ModelGateway: Request with image_generation tool
    ModelGateway->>MCP_Server: Route request to configured MCP server/tool
    MCP_Server-->>ModelGateway: Respond (wrapped JSON / payload)
    ModelGateway->>Transformer: Transform to ImageGenerationCall
    Transformer-->>ModelGateway: ResponseOutputItem::ImageGenerationCall (id "ig_*")
    ModelGateway->>ModelGateway: compact_tool_output_for_model_context(...)
    ModelGateway->>Client: Emit streaming events / record compacted output
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

tests

Suggested reviewers

  • CatherineSue
  • key4ng
  • slin1237

Poem

🐇 I hop through code with joyful cheer,
New image calls now bright and near.
From MCP streams to events that hum,
Pixels bloom where tool calls come,
A rabbit claps — the images appear!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title 'feat(image): add image generation tool support' clearly and directly describes the primary change: adding support for image generation tools. It accurately captures the main objective across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for a built-in image generation tool, updating the MCP configuration, response transformation logic, and streaming event types. It implements a mechanism to extract image payloads from various MCP result formats and adds a utility to compact tool outputs by omitting binary image data from the model context. I have no feedback to provide.

@TingtingZhou7 TingtingZhou7 changed the title Tingzhou/add img tool rebase 2 Add Image Generation Tool Support Apr 7, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b0321a9c4c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@TingtingZhou7 TingtingZhou7 changed the title Add Image Generation Tool Support feat(metrics-ws): [4/4] add image generation tool support Apr 7, 2026
@TingtingZhou7 TingtingZhou7 changed the title feat(metrics-ws): [4/4] add image generation tool support feat(image): [4/4] add image generation tool support Apr 7, 2026
@TingtingZhou7 TingtingZhou7 changed the title feat(image): [4/4] add image generation tool support feat(image): add image generation tool support Apr 7, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
model_gateway/src/routers/grpc/common/responses/streaming.rs (1)

133-149: ⚠️ Potential issue | 🟠 Major

Patch image payloads through result, not output.

Adding image_generation_call to this generic update path reuses the item_data["output"] mutation, but the image-generation response shape stores its payload in result. emit_completed() later forwards the stored JSON verbatim, so this branch will leave the real image field untouched and append an unexpected output key to the streamed response instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/common/responses/streaming.rs` around lines
133 - 149, The update branch that handles tool calls (check using item_data,
is_tool_call, and tool_result) is writing image payloads to item_data["output"],
but image_generation_call stores its payload under "result" and emit_completed()
later forwards that verbatim; change the mutation to detect
image_generation_call and set item_data["result"] (not "output") with the
serialized tool_result.output so the image payload lands in the correct field
when emit_completed() forwards it. Ensure the logic still covers other tool
types unchanged and reference the item_data, is_tool_call, tool_result and
emit_completed symbols when making this conditional write change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/mcp/src/transform/transformer.rs`:
- Around line 42-92: The image_payload_from_wrapped_content function currently
only unwraps payloads that contain "result" and treats them as successful;
update it to detect and preserve explicit failure payloads by checking parsed
objects for a top-level "status" equal to "failed" (case-insensitive) or
presence of an "error" field and returning that parsed Value unchanged so
downstream to_image_generation_call can map it to
ImageGenerationCallStatus::Failed; apply the same check/behavior to the other
similar unwrap block around lines 241-289 (the other image unwrapping logic that
also uses is_image_payload_candidate) and ensure you still return the parsed
payload when it indicates failure rather than wrapping it as a completed result.
- Around line 925-947: The test
test_passthrough_with_generic_action_payload_stays_mcp_call asserts the raw id
"req-1005" but ResponseTransformer/ mcp_response_item_id normalizes request ids
to the form "mcp_req-1005"; update the assertion in the match arm for
ResponseOutputItem::McpCall to expect the normalized id (e.g., call
mcp_response_item_id("req-1005") or assert_eq!(id, "mcp_req-1005")) so the test
checks the post-normalized value returned by ResponseTransformer::transform.

In `@model_gateway/src/routers/openai/mcp/tool_loop.rs`:
- Around line 856-885: In sanitize_builtin_tool_arguments when handling
ResponseFormat::ImageGenerationCall, don't replace the entire arguments Value
with only model and revised_prompt; instead ensure arguments is an object
(create one if not), compute revised_prompt the same way, set/overwrite the
"model" key to IMAGE_MODEL and the "revised_prompt" key to the computed string,
but leave any other existing keys (e.g., "size", "quality", "background",
"output_format", compression) intact so they remain in the payload and recorded
arguments_str; update code in sanitize_builtin_tool_arguments to mutate the
existing object map rather than assigning a new json! object.

In `@model_gateway/src/routers/tool_output_context.rs`:
- Around line 11-53: The helper currently infers success from output JSON in
extract_image_status_and_error, causing opaque failures to be marked
"completed"; change compact_tool_output_for_model_context and
extract_image_status_and_error to accept an explicit execution result (e.g., a
boolean like execution_succeeded or an enum) passed from callers instead of
guessing from output, use that explicit signal as the authoritative source for
setting the "status" field ("completed" vs "failed"), still include any error
text returned by find_error_text when present, and update all call sites to
provide the known success/failure flag so we never treat opaque outputs (e.g.,
{} or non-string errors) as successful.

---

Outside diff comments:
In `@model_gateway/src/routers/grpc/common/responses/streaming.rs`:
- Around line 133-149: The update branch that handles tool calls (check using
item_data, is_tool_call, and tool_result) is writing image payloads to
item_data["output"], but image_generation_call stores its payload under "result"
and emit_completed() later forwards that verbatim; change the mutation to detect
image_generation_call and set item_data["result"] (not "output") with the
serialized tool_result.output so the image payload lands in the correct field
when emit_completed() forwards it. Ensure the logic still covers other tool
types unchanged and reference the item_data, is_tool_call, tool_result and
emit_completed symbols when making this conditional write change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 471fe2e1-9e66-442f-9b0b-7d951fabd64a

📥 Commits

Reviewing files that changed from the base of the PR and between c6c6ebc and b0321a9.

📒 Files selected for processing (15)
  • crates/mcp/src/core/config.rs
  • crates/mcp/src/transform/transformer.rs
  • crates/mcp/src/transform/types.rs
  • crates/protocols/src/event_types.rs
  • crates/protocols/src/responses.rs
  • mcp.local.yaml
  • model_gateway/src/routers/grpc/common/responses/streaming.rs
  • model_gateway/src/routers/grpc/common/responses/utils.rs
  • model_gateway/src/routers/grpc/harmony/builder.rs
  • model_gateway/src/routers/grpc/regular/responses/common.rs
  • model_gateway/src/routers/mcp_utils.rs
  • model_gateway/src/routers/mod.rs
  • model_gateway/src/routers/openai/mcp/tool_loop.rs
  • model_gateway/src/routers/openai/responses/utils.rs
  • model_gateway/src/routers/tool_output_context.rs

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
model_gateway/src/routers/grpc/harmony/builder.rs (1)

425-436: ⚠️ Potential issue | 🟠 Major

Don't disable commentary for built-in-only Responses requests.

with_custom_tools is false when the request only contains image_generation (and the other built-ins), so the subsequent system-message build strips the commentary channel. In this file, built-ins are not described anywhere else, so Harmony has no channel left to emit the tool call.

Suggested fix
-            let with_custom_tools = has_custom_tools(&tool_types);
+            let has_any_tools = !tool_types.is_empty();
+            let with_custom_tools = has_custom_tools(&tool_types);

             // Add system message
-            let sys_msg = self.build_system_message_from_responses(request, with_custom_tools);
+            let sys_msg = self.build_system_message_from_responses(request, has_any_tools);
             all_messages.push(sys_msg);

Also applies to: 442-445

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/harmony/builder.rs` around lines 425 - 436,
The code currently computes with_custom_tools based on whether any tools exist,
which ends up false for built-in-only tool lists (e.g.,
ResponseTool::ImageGeneration, ::CodeInterpreter, ::WebSearchPreview) and causes
the system-message builder to strip the "commentary" channel; update the logic
so with_custom_tools only becomes true when there is at least one
non-built-in/custom tool. Change the check used where tool_types /
with_custom_tools are derived (look for the tool_types Vec and the
with_custom_tools assignment) to scan request.tools and set with_custom_tools =
tools.iter().any(|t| matches!(t, /* enumerate custom variants only */
ResponseTool::Mcp(_) /* or whatever represents custom tools in this enum */  /*
add other non-built-in variants if any */)); this preserves "commentary" for
built-in-only requests and prevents the system-message builder from stripping
that channel.
♻️ Duplicate comments (3)
model_gateway/src/routers/tool_output_context.rs (1)

11-18: ⚠️ Potential issue | 🟠 Major

Use the execution result as the status source of truth.

This helper still guesses "failed" vs "completed" from ad-hoc JSON fields, so opaque failures like {} or non-string error objects get recorded back into model context as successful tool results. Both callers already know whether execution failed; thread that signal into this helper instead of inferring it here.

Suggested fix
 pub fn compact_tool_output_for_model_context(
     response_format: &ResponseFormat,
     output: &Value,
+    execution_succeeded: bool,
 ) -> String {
     match response_format {
         ResponseFormat::ImageGenerationCall => {
-            let (status, error) = extract_image_status_and_error(output);
+            let (status, error) =
+                extract_image_status_and_error(output, execution_succeeded);
             let mut summary = json!({
                 "tool": "generate_image",
                 "status": status,
                 "note": "binary image payload omitted from model context"
             });
@@
-fn extract_image_status_and_error(output: &Value) -> (&'static str, Option<String>) {
+fn extract_image_status_and_error(
+    output: &Value,
+    execution_succeeded: bool,
+) -> (&'static str, Option<String>) {
     if let Some(error) = find_error_text(output) {
         return ("failed", Some(error));
     }
+
+    if !execution_succeeded {
+        return ("failed", None);
+    }

     if let Some(status) = output
         .as_object()
         .and_then(|o| o.get("status"))
         .and_then(|v| v.as_str())

Please update the model_gateway/src/routers/grpc/regular/responses/common.rs and model_gateway/src/routers/openai/mcp/tool_loop.rs call sites to pass the known result bit.

Also applies to: 36-52

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/tool_output_context.rs` around lines 11 - 18, The
helper compact_tool_output_for_model_context should stop inferring success from
JSON and instead accept an explicit execution result flag (e.g., succeeded:
bool) and use that flag as the source of truth when setting status to
"completed" vs "failed" (for the ImageGenerationCall branch that currently calls
extract_image_status_and_error, keep extracting the error payload but ignore its
guessed status). Update the signature of compact_tool_output_for_model_context
and all call sites to pass the known result bit from the callers in
model_gateway/src/routers/grpc/regular/responses/common.rs and
model_gateway/src/routers/openai/mcp/tool_loop.rs, and ensure you still include
any extracted error text/object in the returned context but never derive
success/failure from it.
crates/mcp/src/transform/transformer.rs (2)

936-958: ⚠️ Potential issue | 🟡 Minor

Fix the normalized passthrough ID assertion.

ResponseFormat::Passthrough goes through mcp_response_item_id, so Line 955 should expect the normalized value. The current assertion will fail against Lines 131-133.

Suggested fix
-                assert_eq!(id, "req-1005");
+                assert_eq!(id, mcp_response_item_id("req-1005"));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/mcp/src/transform/transformer.rs` around lines 936 - 958, The test
test_passthrough_with_generic_action_payload_stays_mcp_call asserts the raw id
"req-1005" but ResponseFormat::Passthrough passes through mcp_response_item_id
normalization, so update the assertion in the match arm to expect the normalized
id returned by mcp_response_item_id instead of "req-1005"; locate the test
function and change the expected value for ResponseOutputItem::McpCall { id, ..
} (the call to ResponseTransformer::transform with ResponseFormat::Passthrough)
to the normalized form produced by mcp_response_item_id.

35-37: ⚠️ Potential issue | 🟠 Major

Preserve failed image payloads instead of forcing Completed.

Line 35 only treats objects with result as image payloads, Line 74 repeats that same check for wrapped text, and Line 251 hardcodes ImageGenerationCallStatus::Completed. A payload like {"status":"failed","error":"generation failed"} therefore comes back as a successful image_generation_call with the raw JSON serialized into result. The new error-path test on Lines 831-855 is also locking that bug in.

Suggested fix
     fn is_image_payload_candidate(obj: &serde_json::Map<String, Value>) -> bool {
-        obj.contains_key("result")
+        obj.contains_key("result")
+            || obj.contains_key("error")
+            || obj
+                .get("status")
+                .and_then(|v| v.as_str())
+                .is_some_and(|status| status.eq_ignore_ascii_case("failed"))
     }
@@
                 let Some(text) = obj.get("text").and_then(|v| v.as_str()) else {
                     continue;
                 };
                 if let Ok(parsed) = serde_json::from_str::<Value>(text) {
                     if let Some(parsed_obj) = parsed.as_object() {
-                        if parsed_obj.contains_key("result") {
+                        if Self::is_image_payload_candidate(parsed_obj) {
                             return Some(parsed);
                         }
                     }
                 }
             }
@@
-        let status = ImageGenerationCallStatus::Completed;
+        let status = if obj.is_some_and(|o| {
+            o.contains_key("error")
+                || o.get("status")
+                    .and_then(|v| v.as_str())
+                    .is_some_and(|status| status.eq_ignore_ascii_case("failed"))
+        }) {
+            ImageGenerationCallStatus::Failed
+        } else {
+            ImageGenerationCallStatus::Completed
+        };
@@
-                assert_eq!(status, ImageGenerationCallStatus::Completed);
+                assert_eq!(status, ImageGenerationCallStatus::Failed);

Also applies to: 72-78, 245-289, 831-855

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/mcp/src/transform/transformer.rs` around lines 35 - 37, The current
payload detection and mapping (is_image_payload_candidate and the code that
constructs ImageGenerationCallStatus::Completed) treats any object with a
"result" key as a successful image payload and forces Completed; change the
logic to respect a top-level "status" field: update is_image_payload_candidate
to consider objects that represent an image payload only when "status" ==
"completed" (or keep current semantics but add a separate has_image_status
check), and in the mapper that currently sets
ImageGenerationCallStatus::Completed (the block that hardcodes
ImageGenerationCallStatus::Completed) branch on the "status" value so that
"failed" maps to ImageGenerationCallStatus::Failed (include the "error" string
or the raw JSON as the failure detail), other/unknown statuses preserve the raw
JSON without forcing Completed; apply the same status-aware check used for
wrapped text handling so the checks at is_image_payload_candidate, the
wrapped-text branch, and the mapping that sets Completed are all consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@model_gateway/src/routers/grpc/harmony/builder.rs`:
- Around line 425-436: The code currently computes with_custom_tools based on
whether any tools exist, which ends up false for built-in-only tool lists (e.g.,
ResponseTool::ImageGeneration, ::CodeInterpreter, ::WebSearchPreview) and causes
the system-message builder to strip the "commentary" channel; update the logic
so with_custom_tools only becomes true when there is at least one
non-built-in/custom tool. Change the check used where tool_types /
with_custom_tools are derived (look for the tool_types Vec and the
with_custom_tools assignment) to scan request.tools and set with_custom_tools =
tools.iter().any(|t| matches!(t, /* enumerate custom variants only */
ResponseTool::Mcp(_) /* or whatever represents custom tools in this enum */  /*
add other non-built-in variants if any */)); this preserves "commentary" for
built-in-only requests and prevents the system-message builder from stripping
that channel.

---

Duplicate comments:
In `@crates/mcp/src/transform/transformer.rs`:
- Around line 936-958: The test
test_passthrough_with_generic_action_payload_stays_mcp_call asserts the raw id
"req-1005" but ResponseFormat::Passthrough passes through mcp_response_item_id
normalization, so update the assertion in the match arm to expect the normalized
id returned by mcp_response_item_id instead of "req-1005"; locate the test
function and change the expected value for ResponseOutputItem::McpCall { id, ..
} (the call to ResponseTransformer::transform with ResponseFormat::Passthrough)
to the normalized form produced by mcp_response_item_id.
- Around line 35-37: The current payload detection and mapping
(is_image_payload_candidate and the code that constructs
ImageGenerationCallStatus::Completed) treats any object with a "result" key as a
successful image payload and forces Completed; change the logic to respect a
top-level "status" field: update is_image_payload_candidate to consider objects
that represent an image payload only when "status" == "completed" (or keep
current semantics but add a separate has_image_status check), and in the mapper
that currently sets ImageGenerationCallStatus::Completed (the block that
hardcodes ImageGenerationCallStatus::Completed) branch on the "status" value so
that "failed" maps to ImageGenerationCallStatus::Failed (include the "error"
string or the raw JSON as the failure detail), other/unknown statuses preserve
the raw JSON without forcing Completed; apply the same status-aware check used
for wrapped text handling so the checks at is_image_payload_candidate, the
wrapped-text branch, and the mapping that sets Completed are all consistent.

In `@model_gateway/src/routers/tool_output_context.rs`:
- Around line 11-18: The helper compact_tool_output_for_model_context should
stop inferring success from JSON and instead accept an explicit execution result
flag (e.g., succeeded: bool) and use that flag as the source of truth when
setting status to "completed" vs "failed" (for the ImageGenerationCall branch
that currently calls extract_image_status_and_error, keep extracting the error
payload but ignore its guessed status). Update the signature of
compact_tool_output_for_model_context and all call sites to pass the known
result bit from the callers in
model_gateway/src/routers/grpc/regular/responses/common.rs and
model_gateway/src/routers/openai/mcp/tool_loop.rs, and ensure you still include
any extracted error text/object in the returned context but never derive
success/failure from it.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c61f8cda-de23-40d4-a349-fb0c948b3134

📥 Commits

Reviewing files that changed from the base of the PR and between b0321a9 and a53f2d4.

📒 Files selected for processing (8)
  • crates/mcp/src/transform/transformer.rs
  • crates/protocols/src/responses.rs
  • model_gateway/src/routers/grpc/harmony/builder.rs
  • model_gateway/src/routers/grpc/regular/responses/common.rs
  • model_gateway/src/routers/mcp_utils.rs
  • model_gateway/src/routers/mod.rs
  • model_gateway/src/routers/openai/mcp/tool_loop.rs
  • model_gateway/src/routers/tool_output_context.rs

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 42bd7e2e18

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
crates/mcp/src/transform/transformer.rs (2)

267-286: ⚠️ Potential issue | 🟠 Major

Error payloads are marked as Completed instead of Failed.

The status is unconditionally set to ImageGenerationCallStatus::Completed (line 274), but the protocol defines a Failed variant (per crates/protocols/src/responses.rs:432). Payloads containing an error field or isError: true should emit ImageGenerationCallStatus::Failed. The test test_image_generation_transform_error_payload (lines 854-879) currently asserts Completed for an error payload, reinforcing this gap.

This was flagged in a previous review — the current code still does not parse failure payloads to emit Failed status.

Suggested fix
     fn to_image_generation_call(result: &Value, tool_call_id: &str) -> ResponseOutputItem {
         let payload =
             Self::image_payload_from_wrapped_content(result, &ResponseFormat::ImageGenerationCall)
                 .unwrap_or_else(|| result.clone());
         let obj = payload.as_object();

-        let status = ImageGenerationCallStatus::Completed;
+        // Check for failure indicators
+        let is_error = obj
+            .and_then(|o| o.get("isError"))
+            .and_then(|v| v.as_bool())
+            .unwrap_or(false)
+            || obj.map_or(false, |o| o.contains_key("error"))
+            || obj
+                .and_then(|o| o.get("status"))
+                .and_then(|v| v.as_str())
+                .is_some_and(|s| s.eq_ignore_ascii_case("failed") || s.eq_ignore_ascii_case("error"));
+
+        let status = if is_error {
+            ImageGenerationCallStatus::Failed
+        } else {
+            ImageGenerationCallStatus::Completed
+        };
+
         let output_result = payload
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/mcp/src/transform/transformer.rs` around lines 267 - 286, In
to_image_generation_call, detect failure payloads instead of always using
ImageGenerationCallStatus::Completed: inspect the extracted payload (from
image_payload_from_wrapped_content / payload variable) and set status to
ImageGenerationCallStatus::Failed when payload contains an "error" field,
contains isError: true, or otherwise signals failure; otherwise use Completed.
Update the logic that determines status in the
ResponseOutputItem::ImageGenerationCall construction (id:
format!("ig_{tool_call_id}")), and ensure test
test_image_generation_transform_error_payload asserts Failed.

35-37: 🧹 Nitpick | 🔵 Trivial

is_image_payload_candidate only recognizes success payloads.

This helper returns true only when a result string key exists. Failure payloads with error or status: "failed" fields (but no result) will not be recognized as image payloads, causing image_payload_from_wrapped_content to return None and fall through to the raw result.clone() path. Consider also matching on error or a failure-indicating status field so error payloads are properly unwrapped before being passed to to_image_generation_call.

Suggested helper expansion
-    fn is_image_payload_candidate(obj: &serde_json::Map<String, Value>) -> bool {
-        obj.get("result").and_then(|v| v.as_str()).is_some()
+    fn is_image_payload_candidate(obj: &serde_json::Map<String, Value>) -> bool {
+        obj.get("result").and_then(|v| v.as_str()).is_some()
+            || obj.contains_key("error")
+            || obj
+                .get("status")
+                .and_then(|v| v.as_str())
+                .is_some_and(|s| s.eq_ignore_ascii_case("failed") || s.eq_ignore_ascii_case("error"))
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/mcp/src/transform/transformer.rs` around lines 35 - 37, The helper
is_image_payload_candidate only checks for a "result" string and thus misses
wrapped failure payloads; update is_image_payload_candidate to also return true
when the map contains an "error" key (string/object) or a "status" key equal to
"failed"/"error" so those wrapped error payloads are recognized and unwrapped by
image_payload_from_wrapped_content before being passed to
to_image_generation_call; locate and modify the function
is_image_payload_candidate and ensure its predicates cover "result", "error",
and a failing "status" value.
model_gateway/src/routers/tool_output_context.rs (1)

36-70: 🧹 Nitpick | 🔵 Trivial

Consider checking isError at the top level as well.

The current implementation only checks output.result.isError (the nested JSON-RPC path). Some MCP responses may have isError at the top level when the payload is already unwrapped. Per the previous review discussion, adding a fallback check for top-level isError would make this more robust.

Additionally, the function returns ("completed", None) for any payload that doesn't match the result.isError path — including payloads with a top-level status: "failed" or error field. This could misclassify failures that don't follow the exact JSON-RPC wrapper format.

More robust error detection
 fn extract_image_status_and_error(output: &Value) -> (&'static str, Option<String>) {
-    if output
+    // Check nested JSON-RPC path: output.result.isError
+    let nested_is_error = output
         .as_object()
         .and_then(|o| o.get("result"))
         .and_then(|v| v.as_object())
         .and_then(|r| r.get("isError"))
         .and_then(|v| v.as_bool())
-        == Some(true)
-    {
+        .unwrap_or(false);
+
+    // Check top-level isError (for unwrapped payloads)
+    let top_level_is_error = output
+        .as_object()
+        .and_then(|o| o.get("isError"))
+        .and_then(|v| v.as_bool())
+        .unwrap_or(false);
+
+    if nested_is_error || top_level_is_error {
         let error_text = output
             .as_object()
             .and_then(|obj| obj.get("result"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/tool_output_context.rs` around lines 36 - 70,
Modify extract_image_status_and_error to also inspect top-level fields: check
top-level "isError" (bool), "status" (e.g., "failed"), and a top-level "error"
or "content" array for human-readable error text if the nested result path is
absent; when any of these indicate an error return ("failed", Some(error_text)).
Keep the existing nested result.isError logic as primary, but if that path is
missing fall back to top-level checks and only return ("completed", None) when
neither nested nor top-level indicators show an error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/mcp/src/transform/transformer.rs`:
- Around line 854-879: The test test_image_generation_transform_error_payload
currently asserts ImageGenerationCallStatus::Completed for an error payload;
update the expected status to ImageGenerationCallStatus::Failed (in the match
arm that destructures ResponseOutputItem::ImageGenerationCall) to reflect the
corrected status detection in
to_image_generation_call/ResponseTransformer::transform so the assertion matches
the new failure behavior for {"error":"generation failed"}.

In `@model_gateway/src/routers/tool_output_context.rs`:
- Around line 78-108: Add tests covering the edge cases for
compact_tool_output_for_model_context: (1) create a test similar to
test_compact_image_output_with_jsonrpc_wrapped_text_error that passes a
successful image generation payload (ResponseFormat::ImageGenerationCall) with
no isError and assert parsed.status == "completed" and that there is no "error"
field; (2) add a test that puts isError at the top-level JSON (rather than
inside result) and assert parsed.status == "failed" and the error string is
extracted correctly; (3) add a test for payloads missing both isError and known
error patterns to assert the function returns a neutral/unknown status (e.g.,
not "failed") and handles absence of "error" gracefully; use the same serde_json
parsing/assertion pattern as
test_compact_image_output_with_jsonrpc_wrapped_text_error to locate and validate
fields.

---

Duplicate comments:
In `@crates/mcp/src/transform/transformer.rs`:
- Around line 267-286: In to_image_generation_call, detect failure payloads
instead of always using ImageGenerationCallStatus::Completed: inspect the
extracted payload (from image_payload_from_wrapped_content / payload variable)
and set status to ImageGenerationCallStatus::Failed when payload contains an
"error" field, contains isError: true, or otherwise signals failure; otherwise
use Completed. Update the logic that determines status in the
ResponseOutputItem::ImageGenerationCall construction (id:
format!("ig_{tool_call_id}")), and ensure test
test_image_generation_transform_error_payload asserts Failed.
- Around line 35-37: The helper is_image_payload_candidate only checks for a
"result" string and thus misses wrapped failure payloads; update
is_image_payload_candidate to also return true when the map contains an "error"
key (string/object) or a "status" key equal to "failed"/"error" so those wrapped
error payloads are recognized and unwrapped by
image_payload_from_wrapped_content before being passed to
to_image_generation_call; locate and modify the function
is_image_payload_candidate and ensure its predicates cover "result", "error",
and a failing "status" value.

In `@model_gateway/src/routers/tool_output_context.rs`:
- Around line 36-70: Modify extract_image_status_and_error to also inspect
top-level fields: check top-level "isError" (bool), "status" (e.g., "failed"),
and a top-level "error" or "content" array for human-readable error text if the
nested result path is absent; when any of these indicate an error return
("failed", Some(error_text)). Keep the existing nested result.isError logic as
primary, but if that path is missing fall back to top-level checks and only
return ("completed", None) when neither nested nor top-level indicators show an
error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 84d793a4-e018-4457-84ad-02b5e6cc3925

📥 Commits

Reviewing files that changed from the base of the PR and between 42bd7e2 and 1678672.

📒 Files selected for processing (2)
  • crates/mcp/src/transform/transformer.rs
  • model_gateway/src/routers/tool_output_context.rs

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1678672db4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>
@TingtingZhou7 TingtingZhou7 force-pushed the tingzhou/add-img-tool-rebase-2 branch from efb0d4b to d0da3a9 Compare April 8, 2026 04:44
Copy link
Copy Markdown
Member

@slin1237 slin1237 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work here. The config/enum/streaming plumbing follows the established web_search pattern well. However, there are several design issues that need to be addressed before this can merge.

Major concern: the transformation layer is over-engineered for an unknown MCP response shape.

image_payload_from_wrapped_content has 80+ lines of speculative parsing — trying JSON-RPC wrappers, MCP content arrays, stringified JSON, plain strings, nested result.content objects. Compare this to to_web_search_call which is 15 lines because the MCP contract is well-defined. This complexity suggests we don't have a clear contract with the MCP image generation server yet. We should define the expected MCP response schema first, then write a simple transform against it — not write a catch-all that guesses.

Major concern: mcp.local.yaml committed to repo.

This is a local dev config and should not be in the PR.

Missing: tool options passthrough.

OpenAI's image_generation tool accepts size, quality, background, output_format, action, compression as tool-level options. The ImageGenerationTool struct captures these via #[serde(flatten)] into a HashMap, but they're never passed through to the MCP server as tool call arguments. The MCP server needs these to generate the right image.

Missing: restore_original_tools handling.

The web_search pattern restores the original web_search_preview tool in the response via restore_original_tools() in responses/utils.rs. I don't see the equivalent for image_generation — the response will have the MCP function tool instead of image_generation.

Concern: base64 image data in conversation history.

The compact_tool_output_for_model_context in tool_output_context.rs is called for image generation results. But the result field is a base64 image (potentially megabytes). If this goes into the conversation history that's sent back to the model on multi-turn, it will blow up the context window. OpenAI handles this by referencing previous_response_id or image_generation_call.id — the actual image data is never re-sent to the model.

pub struct ResponseTransformer;

impl ResponseTransformer {
fn is_image_payload_candidate(obj: &serde_json::Map<String, Value>) -> bool {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 80-line image_payload_from_wrapped_content is doing speculative parsing of at least 5 different response shapes:

  1. Direct object with result key
  2. Array of {type: "text", text: "..."}
  3. Object with content array
  4. Object with result.content array (JSON-RPC)
  5. Plain JSON string
  6. Fallback to first non-empty text

Compare this to to_web_search_call which is ~15 lines because we have a defined contract with the MCP web search server.

This level of defensive parsing suggests we haven't defined the MCP image generation server's response contract. Can you:

  1. Document the expected MCP response schema (what does your MCP server actually return?)
  2. Write the transform against that specific schema
  3. Add a single fallback for the unexpected case

The current approach will be a maintenance nightmare — every new MCP server shape that happens to have a result key will match.

Copy link
Copy Markdown
Author

@TingtingZhou7 TingtingZhou7 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern. I have narrow down the parse to only support our mcp server case:

"result":{"content":[{"type":"text","text":"{\n  \"result\": <base64>

Here is the example of mcp server response:

mcp_generate_image_response.json

And for the fallback case, here is the error format we follow:

mcp_generate_image_response_error.json

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compressed to 40lines now. Web search has more than 15line. It just seperate the logic to two calls - let sources = Self::extract_web_sources(result);
let queries = Self::extract_queries(result);

/// open by preserving unknown key/value pairs.
#[derive(Debug, Clone, Deserialize, Serialize, Default, schemars::JsonSchema)]
pub struct ImageGenerationTool {
#[serde(flatten)]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImageGenerationTool uses #[serde(flatten)] pub options: HashMap<String, Value> to capture tool-level options like size, quality, background, etc.

But where are these options forwarded to the MCP server when making the tool call? In the web_search pattern, the tool definition options get threaded through. I don't see the equivalent for image generation — the MCP server will get called without knowing what size/quality/format the user requested.

Also: using a raw HashMap loses all validation. OpenAI specifies valid values for size, quality, background, output_format, action. We should validate the known fields, or at minimum use typed Option<String> fields for the documented ones and #[serde(flatten)] only for future-proofing unknown fields.

Copy link
Copy Markdown
Author

@TingtingZhou7 TingtingZhou7 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I was passing no override to MCP server and just let mcp generate image with openai default setting, I was planning to add in next PR.

But seems like not a big change. So including the change mentioned in your comment.

We should validate the known fields

Right now I am not sure whether we want to keep these validation at our DP side since we already have all these runtime check. For example -

Or we can move to smg. This need be discussed further, and we need to reach to an agreement across all tools on where we should put the validation check.

or at minimum use typed Option fields for the documented ones and #[serde(flatten)] only for future-proofing unknown fields.

Yes. Agree. Updated. I have made the headers override align with what we have in dp -

Also, right now we did not pass the override to mcp server even for web_search tool(@Tobel158 can confirm here), i have add the logic to pass the override vaule to mcp also - please refer to method apply_request_tool_overrides

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the extra to pass opc-compartment-Id which is OCI specific.

use serde_json::{json, Value};
use smg_mcp::ResponseFormat;

/// Build tool output text for model context.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For image generation, compact_tool_output_for_model_context will receive the full MCP result which includes the base64 image in the result field. Does this function strip the base64 data before storing in conversation history?

If not, multi-turn image editing (which is a key use case per the OpenAI spec) will send megabytes of base64 back to the model on every subsequent turn, blowing up the context window. OpenAI solves this by using previous_response_id or image_generation_call.id references — the image data is stored server-side, never re-sent to the model.

Please confirm this function handles the image case by stripping/replacing the base64 data with a reference or summary.

Copy link
Copy Markdown
Author

@TingtingZhou7 TingtingZhou7 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That's what this function for. If you check the line 19-21 in this file. Only these 3 fields are feeded back:

 "tool": "generate_image",
 "status": status, // Status can be failed or completed depends on the isError field from mcp_server response
 "note": error message if isError is true / a preset message if isError is false

And the base64 encoded text is skpped.

/// Routing information for a built-in tool type.
///
/// When a built-in tool type (web_search_preview, code_interpreter, file_search)
/// When a built-in tool type (web_search_preview, code_interpreter, image_generation)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment was updated to list image_generation but dropped file_search from the list. Should be all four: web_search_preview, code_interpreter, image_generation, file_search.

Copy link
Copy Markdown
Author

@TingtingZhou7 TingtingZhou7 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acutally, file_search is only listed in comments but not really in implemented in this file. That's why I dropped it in the comments.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to append image generation and don't remove file search

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added back file search

pub const COMPLETED: &'static str = "response.image_generation_call.completed";

pub const fn as_str(self) -> &'static str {
match self {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GENERATING event (response.image_generation_call.generating) does not appear in the OpenAI streaming spec. OpenAI's image generation streaming goes: in_progress -> partial_image (0-3 times) -> completed.

Adding non-standard SSE events will break clients that validate event types. Can you verify this against the actual OpenAI streaming behavior?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Actually this is what i am confused too. When i tested by calling their api with streaming enabled. I do see the "Generating" Event, which is different from their public doc. So i kind of put it here.

Image

openai_responses_stream_events.json

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cb6a3f832c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +291 to +295
if !matches!(response_format, ResponseFormat::ImageGenerationCall) {
return;
}

let Some(overrides) = image_generation_request_overrides(original_body) else {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit image override injection to builtin image calls

apply_request_tool_overrides applies request-level image options to any tool execution whose response_format is ImageGenerationCall. That means a request that includes both a built-in image_generation tool and another MCP tool using the same response format will have the built-in options (size, quality, etc.) injected into the other tool’s arguments, which can cause schema validation failures or unintended behavior for that tool. The override path should be scoped to the specific builtin-routed image tool identity (name/server), not format alone.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
model_gateway/src/routers/grpc/common/responses/utils.rs (1)

41-53: 🧹 Nitpick | 🔵 Trivial

Use the shared builtin extractor here.

This block now mirrors model_gateway/src/routers/mcp_utils.rs::extract_builtin_types() by hand. We already had to update both places for ImageGeneration; the next builtin addition can easily drift and silently change MCP-routing behavior on only one path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/common/responses/utils.rs` around lines 41 -
53, Replace the ad-hoc builtin-tool detection with the shared extractor: remove
the manual matches that compute has_builtin_tools and instead call the existing
extract_builtin_types(...) from model_gateway::routers::mcp_utils (import it),
passing the same `tools` value; assign its boolean result to `has_builtin_tools`
(preserving the current unwrap_or(false) behavior if needed) so future builtin
additions (like ImageGeneration) are handled in one place. Ensure the call
matches the extractor's signature and adjust imports/signature if necessary.
model_gateway/src/routers/grpc/harmony/builder.rs (1)

425-445: ⚠️ Potential issue | 🟠 Major

Keep commentary enabled when image_generation is the only Responses tool.

Line 442 passes with_custom_tools into build_system_message_from_responses(). After adding ResponseTool::ImageGeneration(_) on Line 435, an image-only request now makes that flag false, so the system message strips the commentary channel even though this builder emits tool calls on commentary; Harmony can no longer produce the requested tool call on that path.

Suggested fix
-            let with_custom_tools = has_custom_tools(&tool_types);
+            let has_any_tools = !tool_types.is_empty();
+            let with_custom_tools = has_custom_tools(&tool_types);

             // Add system message
-            let sys_msg = self.build_system_message_from_responses(request, with_custom_tools);
+            let sys_msg = self.build_system_message_from_responses(request, has_any_tools);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/harmony/builder.rs` around lines 425 - 445,
tool detection treats "image_generation" as non-custom so with_custom_tools
becomes false and build_system_message_from_responses strips the commentary
channel; update the logic that computes with_custom_tools (the call using
tool_types and has_custom_tools) to consider ResponseTool::ImageGeneration (or
the "image_generation" entry in tool_types) as a custom tool so commentary
remains enabled for image-only requests, ensuring
build_system_message_from_responses receives true when the only tool is
ImageGeneration.
crates/protocols/src/event_types.rs (1)

308-350: ⚠️ Potential issue | 🟠 Major

Update the response_format match and builtin event injection match in the streamer.

The first match block (lines 140–160) defaults non-WebSearch builtins to ItemType::MCP_CALL, so ImageGenerationCall, CodeInterpreterCall, and FileSearchCall are all incorrectly mapped to the MCP handler. The second match block (lines 424–436) is missing ImageGenerationCall entirely and will skip event injection for it via the _ => return true fallback.

Both blocks must be updated to handle all four builtin types explicitly:

  • Lines 140–160: Add cases for ResponseFormat::CodeInterpreterCall, ResponseFormat::FileSearchCall, and ResponseFormat::ImageGenerationCall.
  • Lines 424–436: Add a case for ItemType::IMAGE_GENERATION_CALL.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/protocols/src/event_types.rs` around lines 308 - 350, The streamer
match blocks are mis-mapping builtin response formats and omitting image
generation in builtin event injection; update the response_format -> ItemType
mapping so ResponseFormat::CodeInterpreterCall maps to
ItemType::CodeInterpreterCall, ResponseFormat::FileSearchCall maps to
ItemType::FileSearchCall, and ResponseFormat::ImageGenerationCall maps to
ItemType::ImageGenerationCall (in the same match that currently maps WebSearch
to ItemType::WebSearchCall and MCP to ItemType::McpCall), and update the builtin
event injection match to include ItemType::IMAGE_GENERATION_CALL alongside
ItemType::WEB_SEARCH_CALL, ItemType::CODE_INTERPRETER_CALL and
ItemType::FILE_SEARCH_CALL so image generation events are not skipped (refer to
the ResponseFormat variants and ItemType constants like IMAGE_GENERATION_CALL,
CODE_INTERPRETER_CALL, FILE_SEARCH_CALL, and WEB_SEARCH_CALL).
model_gateway/src/routers/grpc/common/responses/streaming.rs (1)

133-149: ⚠️ Potential issue | 🟠 Major

Handle image_generation_call separately instead of writing output.

Including this new type in the generic branch (line 149) adds an output field to streamed image_generation_call items. The OpenAI Responses API specifies that image_generation_call objects use the result field for the base64-encoded image content, not output. This mismatch breaks the streaming response shape.

Diff
                    let item_type = item_data.get("type").and_then(|t| t.as_str());
                    let is_tool_call = matches!(
                        item_type,
                        Some("mcp_call")
                            | Some("web_search_call")
                            | Some("code_interpreter_call")
                            | Some("file_search_call")
                            | Some("image_generation_call")
                    );
                    if is_tool_call
                        && item_data.get("call_id").and_then(|c| c.as_str())
                            == Some(&tool_result.call_id)
                    {
                        // Add output field
                        let output_str = serde_json::to_string(&tool_result.output)
                            .unwrap_or_else(|_| "{}".to_string());
                        item_data["output"] = json!(output_str);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/common/responses/streaming.rs` around lines
133 - 149, The current branch treats image_generation_call like other tool calls
and writes tool data into item_data["output"], but image_generation_call must
use the "result" field with base64 image content; update the logic around
item_type / is_tool_call so that when item_type == Some("image_generation_call")
and item_data.call_id matches tool_result.call_id you populate
item_data["result"] with the base64/stringified image from tool_result.output
(or its appropriate field) instead of writing item_data["output"], and keep the
existing item_data["output"] behavior for the other tool types (mcp_call,
web_search_call, code_interpreter_call, file_search_call) so streaming shape
remains correct for image_generation_call vs other tool calls.
♻️ Duplicate comments (2)
model_gateway/src/routers/tool_output_context.rs (1)

36-72: ⚠️ Potential issue | 🟠 Major

Don't infer image-tool success from one JSON envelope.

This helper only returns "failed" when result.isError == true; a plain string error or any differently wrapped failure falls through to "completed". Because this summary is fed back into tool-call context, the next turn can be told the image was generated successfully when it was not. Please thread the authoritative execution result into this helper instead of guessing from the payload shape.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/tool_output_context.rs` around lines 36 - 72, The
helper extract_image_status_and_error currently infers success solely from the
JSON payload shape which can mislabel failed runs as "completed"; change its
signature to accept the authoritative execution result (e.g., an
ExecutionResult/RunStatus enum or a boolean like execution_failed) alongside the
existing output: &Value, then use that authoritative parameter to decide between
"failed" and "completed" while still extracting the error text from the payload
(preserve the existing extraction logic for error/content). Update all call
sites that invoke extract_image_status_and_error to pass the real execution
result (the run/response status produced by the image tool) so the helper no
longer guesses success from payload shape alone.
mcp.local.yaml (1)

1-17: ⚠️ Potential issue | 🟡 Minor

Remove the local MCP config from this PR.

This is still a developer-specific localhost file and should not ship with the feature change.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcp.local.yaml` around lines 1 - 17, This PR accidentally includes a
developer-local config (mcp.local.yaml) containing servers entries (e.g., name:
generate_image and name: search_web with builtin_type image_generation and
web_search_preview); remove this file from the commit/PR (delete mcp.local.yaml
or revert its addition) and, if needed, add an entry to .gitignore to prevent
re-adding local MCP configs in the future so only environment-independent
configs remain in the repo.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/mcp/src/transform/transformer.rs`:
- Around line 35-114: Add a doc comment to the
image_payload_from_wrapped_content function describing the MCP response shapes
it supports and why _format is unused: list that it accepts (1) a direct object
payload with a "result" string, (2) MCP CallToolResult-style arrays like
[{"type":"text","text":"..."}] where the text may be stringified JSON containing
a "result" object, (3) nested JSON-RPC result.content arrays, (4) a raw JSON
string payload, and (5) a plain text fallback; also note that _format is
reserved for future format-specific gating. Place this comment immediately above
fn image_payload_from_wrapped_content to make the contract clear for reviewers
and maintainers.

---

Outside diff comments:
In `@crates/protocols/src/event_types.rs`:
- Around line 308-350: The streamer match blocks are mis-mapping builtin
response formats and omitting image generation in builtin event injection;
update the response_format -> ItemType mapping so
ResponseFormat::CodeInterpreterCall maps to ItemType::CodeInterpreterCall,
ResponseFormat::FileSearchCall maps to ItemType::FileSearchCall, and
ResponseFormat::ImageGenerationCall maps to ItemType::ImageGenerationCall (in
the same match that currently maps WebSearch to ItemType::WebSearchCall and MCP
to ItemType::McpCall), and update the builtin event injection match to include
ItemType::IMAGE_GENERATION_CALL alongside ItemType::WEB_SEARCH_CALL,
ItemType::CODE_INTERPRETER_CALL and ItemType::FILE_SEARCH_CALL so image
generation events are not skipped (refer to the ResponseFormat variants and
ItemType constants like IMAGE_GENERATION_CALL, CODE_INTERPRETER_CALL,
FILE_SEARCH_CALL, and WEB_SEARCH_CALL).

In `@model_gateway/src/routers/grpc/common/responses/streaming.rs`:
- Around line 133-149: The current branch treats image_generation_call like
other tool calls and writes tool data into item_data["output"], but
image_generation_call must use the "result" field with base64 image content;
update the logic around item_type / is_tool_call so that when item_type ==
Some("image_generation_call") and item_data.call_id matches tool_result.call_id
you populate item_data["result"] with the base64/stringified image from
tool_result.output (or its appropriate field) instead of writing
item_data["output"], and keep the existing item_data["output"] behavior for the
other tool types (mcp_call, web_search_call, code_interpreter_call,
file_search_call) so streaming shape remains correct for image_generation_call
vs other tool calls.

In `@model_gateway/src/routers/grpc/common/responses/utils.rs`:
- Around line 41-53: Replace the ad-hoc builtin-tool detection with the shared
extractor: remove the manual matches that compute has_builtin_tools and instead
call the existing extract_builtin_types(...) from
model_gateway::routers::mcp_utils (import it), passing the same `tools` value;
assign its boolean result to `has_builtin_tools` (preserving the current
unwrap_or(false) behavior if needed) so future builtin additions (like
ImageGeneration) are handled in one place. Ensure the call matches the
extractor's signature and adjust imports/signature if necessary.

In `@model_gateway/src/routers/grpc/harmony/builder.rs`:
- Around line 425-445: tool detection treats "image_generation" as non-custom so
with_custom_tools becomes false and build_system_message_from_responses strips
the commentary channel; update the logic that computes with_custom_tools (the
call using tool_types and has_custom_tools) to consider
ResponseTool::ImageGeneration (or the "image_generation" entry in tool_types) as
a custom tool so commentary remains enabled for image-only requests, ensuring
build_system_message_from_responses receives true when the only tool is
ImageGeneration.

---

Duplicate comments:
In `@mcp.local.yaml`:
- Around line 1-17: This PR accidentally includes a developer-local config
(mcp.local.yaml) containing servers entries (e.g., name: generate_image and
name: search_web with builtin_type image_generation and web_search_preview);
remove this file from the commit/PR (delete mcp.local.yaml or revert its
addition) and, if needed, add an entry to .gitignore to prevent re-adding local
MCP configs in the future so only environment-independent configs remain in the
repo.

In `@model_gateway/src/routers/tool_output_context.rs`:
- Around line 36-72: The helper extract_image_status_and_error currently infers
success solely from the JSON payload shape which can mislabel failed runs as
"completed"; change its signature to accept the authoritative execution result
(e.g., an ExecutionResult/RunStatus enum or a boolean like execution_failed)
alongside the existing output: &Value, then use that authoritative parameter to
decide between "failed" and "completed" while still extracting the error text
from the payload (preserve the existing extraction logic for error/content).
Update all call sites that invoke extract_image_status_and_error to pass the
real execution result (the run/response status produced by the image tool) so
the helper no longer guesses success from payload shape alone.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cb09d0a1-8926-42a1-987c-9e05e3bd0661

📥 Commits

Reviewing files that changed from the base of the PR and between 1678672 and cb6a3f8.

📒 Files selected for processing (17)
  • .gitignore
  • crates/mcp/src/core/config.rs
  • crates/mcp/src/transform/transformer.rs
  • crates/mcp/src/transform/types.rs
  • crates/protocols/src/event_types.rs
  • crates/protocols/src/responses.rs
  • mcp.local.yaml
  • model_gateway/src/routers/grpc/common/responses/streaming.rs
  • model_gateway/src/routers/grpc/common/responses/utils.rs
  • model_gateway/src/routers/grpc/harmony/builder.rs
  • model_gateway/src/routers/grpc/regular/responses/common.rs
  • model_gateway/src/routers/mcp_utils.rs
  • model_gateway/src/routers/mod.rs
  • model_gateway/src/routers/openai/mcp/tool_loop.rs
  • model_gateway/src/routers/openai/responses/streaming.rs
  • model_gateway/src/routers/openai/responses/utils.rs
  • model_gateway/src/routers/tool_output_context.rs

@TingtingZhou7 TingtingZhou7 force-pushed the tingzhou/add-img-tool-rebase-2 branch from cb6a3f8 to 0cf8cc9 Compare April 8, 2026 18:44
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0cf8cc9179

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@TingtingZhou7 TingtingZhou7 force-pushed the tingzhou/add-img-tool-rebase-2 branch from 0cf8cc9 to 5749506 Compare April 8, 2026 18:56
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 57495062be

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@TingtingZhou7 TingtingZhou7 force-pushed the tingzhou/add-img-tool-rebase-2 branch from 5749506 to cebd5bc Compare April 8, 2026 19:18
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cebd5bc87c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

ResponseTool::WebSearchPreview(_) | ResponseTool::CodeInterpreter(_)
ResponseTool::WebSearchPreview(_)
| ResponseTool::CodeInterpreter(_)
| ResponseTool::ImageGeneration(_)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply image tool overrides in gRPC MCP execution

By including ResponseTool::ImageGeneration in MCP-routing detection here, gRPC /responses requests with image tools can now be intercepted and executed via MCP, but the gRPC execution paths still pass only model-emitted function-call arguments to execute_tool (no merge of request-level tool options such as size, quality, or background). In practice, explicit request options can be dropped unless the model repeats them, so image behavior diverges from the OpenAI router path that now calls apply_request_tool_overrides before execution.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point. I will change in a new pr. I don want to keep adding scope to this pr

@TingtingZhou7 TingtingZhou7 force-pushed the tingzhou/add-img-tool-rebase-2 branch from cebd5bc to 74c5cf6 Compare April 8, 2026 19:27
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 74c5cf6ec7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +152 to +155
output: result
.as_str()
.map(str::to_string)
.unwrap_or_else(|| result.to_string()),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore text flattening for passthrough MCP outputs

This mcp_call path now returns result.to_string() for any non-string tool output, which regresses common MCP responses where tools/call returns content arrays like [{"type":"text","text":"..."}]: clients will receive the raw JSON wrapper instead of the actual text payload. Because passthrough tools use this branch broadly, normal successful tool calls can become harder to consume and differ from previous behavior that extracted text blocks; this is user-visible for any MCP server returning structured content.

Useful? React with 👍 / 👎.

@TingtingZhou7 TingtingZhou7 force-pushed the tingzhou/add-img-tool-rebase-2 branch from 74c5cf6 to d7799f4 Compare April 8, 2026 19:38
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d7799f40a9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@TingtingZhou7 TingtingZhou7 force-pushed the tingzhou/add-img-tool-rebase-2 branch from d7799f4 to b92ca7e Compare April 8, 2026 19:45
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b92ca7e84b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +37 to +41
if output
.as_object()
.and_then(|o| o.get("result"))
.and_then(|v| v.as_object())
.and_then(|r| r.get("isError"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle array-shaped MCP outputs when deriving image status

extract_image_status_and_error only inspects output.result.isError, but MCP tool execution in this stack commonly yields array-shaped payloads (fresh evidence: McpOrchestrator::call_result_to_json serializes CallToolResult.content directly), so this predicate is never reached for normal image tool outputs. In those cases compact_tool_output_for_model_context always records "status":"completed" and drops error text, which feeds incorrect success state into tool-loop conversation history and can suppress retries/failure handling after image generation errors.

Useful? React with 👍 / 👎.

@TingtingZhou7 TingtingZhou7 force-pushed the tingzhou/add-img-tool-rebase-2 branch from b92ca7e to 808cf73 Compare April 8, 2026 20:19
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 808cf73621

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@TingtingZhou7 TingtingZhou7 force-pushed the tingzhou/add-img-tool-rebase-2 branch from 808cf73 to 65f9747 Compare April 8, 2026 20:44
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 65f974754e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

) -> String {
match response_format {
ResponseFormat::ImageGenerationCall => {
let is_error = is_image_generation_error(output);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard image context compaction against non-wrapper outputs

compact_tool_output_for_model_context calls is_image_generation_error(output) unconditionally, but that helper panics unless output.result.isError is present. In the MCP execution path, image tool outputs are commonly array/string payloads (serialized CallToolResult.content), so this can panic during normal tool-loop history compaction instead of returning a summary, aborting the request path after tool execution.

Useful? React with 👍 / 👎.

Comment on lines +151 to +153
ResponseFormat::ImageGenerationCall => {
(ItemType::IMAGE_GENERATION_CALL, "ig_")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep image argument events on image item IDs

This new image_generation_call rewrite assigns ig_ item IDs for streamed tool items, but argument events are still normalized through the MCP path (mcp_call.arguments_* with mcp_ IDs). For intercepted image tool calls, clients can receive response.output_item.added for ig_* while argument events reference mcp_*, breaking per-item event correlation in streaming consumers.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streaming test/change is not scoped in this PR. I already found bunch of issues for streaming part. I need to do a thouroug test for that.

Note down this comment but not plan to add any change,

@TingtingZhou7 TingtingZhou7 force-pushed the tingzhou/add-img-tool-rebase-2 branch from 65f9747 to 14ca8d1 Compare April 8, 2026 20:59
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14ca8d10f6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>
@TingtingZhou7 TingtingZhou7 force-pushed the tingzhou/add-img-tool-rebase-2 branch from 14ca8d1 to 9106f1d Compare April 8, 2026 21:29
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9106f1d630

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@TingtingZhou7 TingtingZhou7 requested a review from slin1237 April 8, 2026 22:02
Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21b4352954

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

/// single fallback path.
fn image_payload_from_wrapped_content(result: &Value) -> Option<Value> {
// Parse the JSON-RPC wrapper object under top-level `result`.
let result_obj = result.get("result").and_then(|v| v.as_object())?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse SSE-wrapped image errors before returning fallback output

image_payload_from_wrapped_content immediately assumes the tool output is an object with a top-level result, so SSE-style string payloads (e.g. event: message\ndata: {..."isError":true...}) always return None here and fall through to the generic string fallback. In that case image_generation_call.result becomes the raw SSE envelope instead of the actual error text, which degrades client-visible error handling for streamable MCP image servers.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

streaming events will be handle in seperated pr. Not scoped to this pr

Copy link
Copy Markdown
Member

@slin1237 slin1237 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. A few items from my original review are still open — want to make sure they don't get lost:

1. restore_original_tools for image_generation (no response yet)

The web_search pattern restores the original web_search_preview tool definition in the response payload via restore_original_tools() in model_gateway/src/routers/openai/responses/utils.rs. Without the equivalent for image_generation, the client's response will contain the MCP function tool definition instead of {"type": "image_generation"}. This breaks clients that check the response's tools array to know which built-in tools were active.

2. response.image_generation_call.generating SSE event (no response yet)

This event doesn't exist in the OpenAI streaming spec. OpenAI's image generation streaming goes in_progresspartial_image (0-3 times) → completed. Adding a non-standard event type will break clients that validate SSE event names. Can you verify this against the actual OpenAI behavior, or remove it and use an existing event?

3. Tool options deferral — needs documentation or validation

Understood that size, quality, background, output_format passthrough is deferred to a follow-up PR. However, right now a client can send {"type": "image_generation", "size": "1024x1536", "quality": "high"} and those options are silently ignored — the image will be generated with whatever the MCP server defaults to. This is a silent contract violation from the client's perspective.

At minimum, can we either:

  • (a) Add a log warning when tool options are present but ignored, or
  • (b) Document this limitation in a code comment at the sanitize_builtin_tool_arguments call site, or
  • (c) Link the follow-up issue/PR number so we can track it

4. isError flag not checked in extract_image_status_and_error

+1 to CodeRabbit's comment on this. The MCP protocol's isError: true is the authoritative signal for a failed tool call. The current code only pattern-matches on "Error executing tool" text prefix, which misses any MCP server that returns a different error message format with isError: true. This is a one-line fix.

5. Doc comment in mcp_utils.rs

Minor: the updated doc comment lists web_search_preview, code_interpreter, image_generation but dropped file_search. Should list all four.

@TingtingZhou7
Copy link
Copy Markdown
Author

TingtingZhou7 commented Apr 9, 2026

Thanks for the updates. A few items from my original review are still open — want to make sure they don't get lost:

1. restore_original_tools for image_generation (no response yet)

The web_search pattern restores the original web_search_preview tool definition in the response payload via restore_original_tools() in model_gateway/src/routers/openai/responses/utils.rs. Without the equivalent for image_generation, the client's response will contain the MCP function tool definition instead of {"type": "image_generation"}. This breaks clients that check the response's tools array to know which built-in tools were active.

2. response.image_generation_call.generating SSE event (no response yet)

This event doesn't exist in the OpenAI streaming spec. OpenAI's image generation streaming goes in_progresspartial_image (0-3 times) → completed. Adding a non-standard event type will break clients that validate SSE event names. Can you verify this against the actual OpenAI behavior, or remove it and use an existing event?

3. Tool options deferral — needs documentation or validation

Understood that size, quality, background, output_format passthrough is deferred to a follow-up PR. However, right now a client can send {"type": "image_generation", "size": "1024x1536", "quality": "high"} and those options are silently ignored — the image will be generated with whatever the MCP server defaults to. This is a silent contract violation from the client's perspective.

At minimum, can we either:

* (a) Add a log warning when tool options are present but ignored, or

* (b) Document this limitation in a code comment at the `sanitize_builtin_tool_arguments` call site, or

* (c) Link the follow-up issue/PR number so we can track it

4. isError flag not checked in extract_image_status_and_error

+1 to CodeRabbit's comment on this. The MCP protocol's isError: true is the authoritative signal for a failed tool call. The current code only pattern-matches on "Error executing tool" text prefix, which misses any MCP server that returns a different error message format with isError: true. This is a one-line fix.

5. Doc comment in mcp_utils.rs

Minor: the updated doc comment lists web_search_preview, code_interpreter, image_generation but dropped file_search. Should list all four.

  1. response.image_generation_call.generating SSE event
    I already replied before. Basically open ai doc does not contains the generating sse event, but if you call their api. It contains generating event.

  2. Tool options deferral — needs documentation or validation
    I already add the support to pass whatever user passes to MCP server now. Please search for apply_request_tool_overrides in this commit - https://github.com/lightseekorg/smg/pull/1057/changes/9106f1d6309c416a55f9555a15bbd3e69a290f39 

  3. Doc comment in mcp_utils.rsfile_search. Should list all four.
    Are you reviewing on latest commit? file_search is not dropped. 

  4. isError flag not checked in extract_image_status_and_error
    "The current code only pattern-matches on "Error executing tool" text prefix"
     The statement is not true. I can not find any code matches with this statement. Can you point the problematic code. AI comment sometimes can be wrong.

@TingtingZhou7 TingtingZhou7 requested a review from slin1237 April 9, 2026 07:40
…sing

Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>
@TingtingZhou7 TingtingZhou7 force-pushed the tingzhou/add-img-tool-rebase-2 branch from 00183d7 to 2f0de11 Compare April 9, 2026 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

grpc gRPC client and router changes mcp MCP related changes model-gateway Model gateway crate changes openai OpenAI router changes protocols Protocols crate changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants