feat(image): add image generation tool support#1057
feat(image): add image generation tool support#1057TingtingZhou7 wants to merge 4 commits intolightseekorg:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds ImageGeneration built-in tool and ImageGenerationCall response format, wiring enums/serde, transformer conversion, MCP routing, streaming events/IDs, tool-loop argument handling, and model-context output compaction across gateway and protocol crates. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ModelGateway
participant MCP_Server as MCP (generate_image)
participant Transformer
Client->>ModelGateway: Request with image_generation tool
ModelGateway->>MCP_Server: Route request to configured MCP server/tool
MCP_Server-->>ModelGateway: Respond (wrapped JSON / payload)
ModelGateway->>Transformer: Transform to ImageGenerationCall
Transformer-->>ModelGateway: ResponseOutputItem::ImageGenerationCall (id "ig_*")
ModelGateway->>ModelGateway: compact_tool_output_for_model_context(...)
ModelGateway->>Client: Emit streaming events / record compacted output
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces support for a built-in image generation tool, updating the MCP configuration, response transformation logic, and streaming event types. It implements a mechanism to extract image payloads from various MCP result formats and adds a utility to compact tool outputs by omitting binary image data from the model context. I have no feedback to provide.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b0321a9c4c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
model_gateway/src/routers/grpc/common/responses/streaming.rs (1)
133-149:⚠️ Potential issue | 🟠 MajorPatch image payloads through
result, notoutput.Adding
image_generation_callto this generic update path reuses theitem_data["output"]mutation, but the image-generation response shape stores its payload inresult.emit_completed()later forwards the stored JSON verbatim, so this branch will leave the real image field untouched and append an unexpectedoutputkey to the streamed response instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/common/responses/streaming.rs` around lines 133 - 149, The update branch that handles tool calls (check using item_data, is_tool_call, and tool_result) is writing image payloads to item_data["output"], but image_generation_call stores its payload under "result" and emit_completed() later forwards that verbatim; change the mutation to detect image_generation_call and set item_data["result"] (not "output") with the serialized tool_result.output so the image payload lands in the correct field when emit_completed() forwards it. Ensure the logic still covers other tool types unchanged and reference the item_data, is_tool_call, tool_result and emit_completed symbols when making this conditional write change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/mcp/src/transform/transformer.rs`:
- Around line 42-92: The image_payload_from_wrapped_content function currently
only unwraps payloads that contain "result" and treats them as successful;
update it to detect and preserve explicit failure payloads by checking parsed
objects for a top-level "status" equal to "failed" (case-insensitive) or
presence of an "error" field and returning that parsed Value unchanged so
downstream to_image_generation_call can map it to
ImageGenerationCallStatus::Failed; apply the same check/behavior to the other
similar unwrap block around lines 241-289 (the other image unwrapping logic that
also uses is_image_payload_candidate) and ensure you still return the parsed
payload when it indicates failure rather than wrapping it as a completed result.
- Around line 925-947: The test
test_passthrough_with_generic_action_payload_stays_mcp_call asserts the raw id
"req-1005" but ResponseTransformer/ mcp_response_item_id normalizes request ids
to the form "mcp_req-1005"; update the assertion in the match arm for
ResponseOutputItem::McpCall to expect the normalized id (e.g., call
mcp_response_item_id("req-1005") or assert_eq!(id, "mcp_req-1005")) so the test
checks the post-normalized value returned by ResponseTransformer::transform.
In `@model_gateway/src/routers/openai/mcp/tool_loop.rs`:
- Around line 856-885: In sanitize_builtin_tool_arguments when handling
ResponseFormat::ImageGenerationCall, don't replace the entire arguments Value
with only model and revised_prompt; instead ensure arguments is an object
(create one if not), compute revised_prompt the same way, set/overwrite the
"model" key to IMAGE_MODEL and the "revised_prompt" key to the computed string,
but leave any other existing keys (e.g., "size", "quality", "background",
"output_format", compression) intact so they remain in the payload and recorded
arguments_str; update code in sanitize_builtin_tool_arguments to mutate the
existing object map rather than assigning a new json! object.
In `@model_gateway/src/routers/tool_output_context.rs`:
- Around line 11-53: The helper currently infers success from output JSON in
extract_image_status_and_error, causing opaque failures to be marked
"completed"; change compact_tool_output_for_model_context and
extract_image_status_and_error to accept an explicit execution result (e.g., a
boolean like execution_succeeded or an enum) passed from callers instead of
guessing from output, use that explicit signal as the authoritative source for
setting the "status" field ("completed" vs "failed"), still include any error
text returned by find_error_text when present, and update all call sites to
provide the known success/failure flag so we never treat opaque outputs (e.g.,
{} or non-string errors) as successful.
---
Outside diff comments:
In `@model_gateway/src/routers/grpc/common/responses/streaming.rs`:
- Around line 133-149: The update branch that handles tool calls (check using
item_data, is_tool_call, and tool_result) is writing image payloads to
item_data["output"], but image_generation_call stores its payload under "result"
and emit_completed() later forwards that verbatim; change the mutation to detect
image_generation_call and set item_data["result"] (not "output") with the
serialized tool_result.output so the image payload lands in the correct field
when emit_completed() forwards it. Ensure the logic still covers other tool
types unchanged and reference the item_data, is_tool_call, tool_result and
emit_completed symbols when making this conditional write change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 471fe2e1-9e66-442f-9b0b-7d951fabd64a
📒 Files selected for processing (15)
crates/mcp/src/core/config.rscrates/mcp/src/transform/transformer.rscrates/mcp/src/transform/types.rscrates/protocols/src/event_types.rscrates/protocols/src/responses.rsmcp.local.yamlmodel_gateway/src/routers/grpc/common/responses/streaming.rsmodel_gateway/src/routers/grpc/common/responses/utils.rsmodel_gateway/src/routers/grpc/harmony/builder.rsmodel_gateway/src/routers/grpc/regular/responses/common.rsmodel_gateway/src/routers/mcp_utils.rsmodel_gateway/src/routers/mod.rsmodel_gateway/src/routers/openai/mcp/tool_loop.rsmodel_gateway/src/routers/openai/responses/utils.rsmodel_gateway/src/routers/tool_output_context.rs
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
model_gateway/src/routers/grpc/harmony/builder.rs (1)
425-436:⚠️ Potential issue | 🟠 MajorDon't disable
commentaryfor built-in-only Responses requests.
with_custom_toolsisfalsewhen the request only containsimage_generation(and the other built-ins), so the subsequent system-message build strips thecommentarychannel. In this file, built-ins are not described anywhere else, so Harmony has no channel left to emit the tool call.Suggested fix
- let with_custom_tools = has_custom_tools(&tool_types); + let has_any_tools = !tool_types.is_empty(); + let with_custom_tools = has_custom_tools(&tool_types); // Add system message - let sys_msg = self.build_system_message_from_responses(request, with_custom_tools); + let sys_msg = self.build_system_message_from_responses(request, has_any_tools); all_messages.push(sys_msg);Also applies to: 442-445
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/harmony/builder.rs` around lines 425 - 436, The code currently computes with_custom_tools based on whether any tools exist, which ends up false for built-in-only tool lists (e.g., ResponseTool::ImageGeneration, ::CodeInterpreter, ::WebSearchPreview) and causes the system-message builder to strip the "commentary" channel; update the logic so with_custom_tools only becomes true when there is at least one non-built-in/custom tool. Change the check used where tool_types / with_custom_tools are derived (look for the tool_types Vec and the with_custom_tools assignment) to scan request.tools and set with_custom_tools = tools.iter().any(|t| matches!(t, /* enumerate custom variants only */ ResponseTool::Mcp(_) /* or whatever represents custom tools in this enum */ /* add other non-built-in variants if any */)); this preserves "commentary" for built-in-only requests and prevents the system-message builder from stripping that channel.
♻️ Duplicate comments (3)
model_gateway/src/routers/tool_output_context.rs (1)
11-18:⚠️ Potential issue | 🟠 MajorUse the execution result as the status source of truth.
This helper still guesses
"failed"vs"completed"from ad-hoc JSON fields, so opaque failures like{}or non-string error objects get recorded back into model context as successful tool results. Both callers already know whether execution failed; thread that signal into this helper instead of inferring it here.Suggested fix
pub fn compact_tool_output_for_model_context( response_format: &ResponseFormat, output: &Value, + execution_succeeded: bool, ) -> String { match response_format { ResponseFormat::ImageGenerationCall => { - let (status, error) = extract_image_status_and_error(output); + let (status, error) = + extract_image_status_and_error(output, execution_succeeded); let mut summary = json!({ "tool": "generate_image", "status": status, "note": "binary image payload omitted from model context" }); @@ -fn extract_image_status_and_error(output: &Value) -> (&'static str, Option<String>) { +fn extract_image_status_and_error( + output: &Value, + execution_succeeded: bool, +) -> (&'static str, Option<String>) { if let Some(error) = find_error_text(output) { return ("failed", Some(error)); } + + if !execution_succeeded { + return ("failed", None); + } if let Some(status) = output .as_object() .and_then(|o| o.get("status")) .and_then(|v| v.as_str())Please update the
model_gateway/src/routers/grpc/regular/responses/common.rsandmodel_gateway/src/routers/openai/mcp/tool_loop.rscall sites to pass the known result bit.Also applies to: 36-52
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/tool_output_context.rs` around lines 11 - 18, The helper compact_tool_output_for_model_context should stop inferring success from JSON and instead accept an explicit execution result flag (e.g., succeeded: bool) and use that flag as the source of truth when setting status to "completed" vs "failed" (for the ImageGenerationCall branch that currently calls extract_image_status_and_error, keep extracting the error payload but ignore its guessed status). Update the signature of compact_tool_output_for_model_context and all call sites to pass the known result bit from the callers in model_gateway/src/routers/grpc/regular/responses/common.rs and model_gateway/src/routers/openai/mcp/tool_loop.rs, and ensure you still include any extracted error text/object in the returned context but never derive success/failure from it.crates/mcp/src/transform/transformer.rs (2)
936-958:⚠️ Potential issue | 🟡 MinorFix the normalized passthrough ID assertion.
ResponseFormat::Passthroughgoes throughmcp_response_item_id, so Line 955 should expect the normalized value. The current assertion will fail against Lines 131-133.Suggested fix
- assert_eq!(id, "req-1005"); + assert_eq!(id, mcp_response_item_id("req-1005"));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/mcp/src/transform/transformer.rs` around lines 936 - 958, The test test_passthrough_with_generic_action_payload_stays_mcp_call asserts the raw id "req-1005" but ResponseFormat::Passthrough passes through mcp_response_item_id normalization, so update the assertion in the match arm to expect the normalized id returned by mcp_response_item_id instead of "req-1005"; locate the test function and change the expected value for ResponseOutputItem::McpCall { id, .. } (the call to ResponseTransformer::transform with ResponseFormat::Passthrough) to the normalized form produced by mcp_response_item_id.
35-37:⚠️ Potential issue | 🟠 MajorPreserve failed image payloads instead of forcing
Completed.Line 35 only treats objects with
resultas image payloads, Line 74 repeats that same check for wrapped text, and Line 251 hardcodesImageGenerationCallStatus::Completed. A payload like{"status":"failed","error":"generation failed"}therefore comes back as a successfulimage_generation_callwith the raw JSON serialized intoresult. The new error-path test on Lines 831-855 is also locking that bug in.Suggested fix
fn is_image_payload_candidate(obj: &serde_json::Map<String, Value>) -> bool { - obj.contains_key("result") + obj.contains_key("result") + || obj.contains_key("error") + || obj + .get("status") + .and_then(|v| v.as_str()) + .is_some_and(|status| status.eq_ignore_ascii_case("failed")) } @@ let Some(text) = obj.get("text").and_then(|v| v.as_str()) else { continue; }; if let Ok(parsed) = serde_json::from_str::<Value>(text) { if let Some(parsed_obj) = parsed.as_object() { - if parsed_obj.contains_key("result") { + if Self::is_image_payload_candidate(parsed_obj) { return Some(parsed); } } } } @@ - let status = ImageGenerationCallStatus::Completed; + let status = if obj.is_some_and(|o| { + o.contains_key("error") + || o.get("status") + .and_then(|v| v.as_str()) + .is_some_and(|status| status.eq_ignore_ascii_case("failed")) + }) { + ImageGenerationCallStatus::Failed + } else { + ImageGenerationCallStatus::Completed + }; @@ - assert_eq!(status, ImageGenerationCallStatus::Completed); + assert_eq!(status, ImageGenerationCallStatus::Failed);Also applies to: 72-78, 245-289, 831-855
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/mcp/src/transform/transformer.rs` around lines 35 - 37, The current payload detection and mapping (is_image_payload_candidate and the code that constructs ImageGenerationCallStatus::Completed) treats any object with a "result" key as a successful image payload and forces Completed; change the logic to respect a top-level "status" field: update is_image_payload_candidate to consider objects that represent an image payload only when "status" == "completed" (or keep current semantics but add a separate has_image_status check), and in the mapper that currently sets ImageGenerationCallStatus::Completed (the block that hardcodes ImageGenerationCallStatus::Completed) branch on the "status" value so that "failed" maps to ImageGenerationCallStatus::Failed (include the "error" string or the raw JSON as the failure detail), other/unknown statuses preserve the raw JSON without forcing Completed; apply the same status-aware check used for wrapped text handling so the checks at is_image_payload_candidate, the wrapped-text branch, and the mapping that sets Completed are all consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@model_gateway/src/routers/grpc/harmony/builder.rs`:
- Around line 425-436: The code currently computes with_custom_tools based on
whether any tools exist, which ends up false for built-in-only tool lists (e.g.,
ResponseTool::ImageGeneration, ::CodeInterpreter, ::WebSearchPreview) and causes
the system-message builder to strip the "commentary" channel; update the logic
so with_custom_tools only becomes true when there is at least one
non-built-in/custom tool. Change the check used where tool_types /
with_custom_tools are derived (look for the tool_types Vec and the
with_custom_tools assignment) to scan request.tools and set with_custom_tools =
tools.iter().any(|t| matches!(t, /* enumerate custom variants only */
ResponseTool::Mcp(_) /* or whatever represents custom tools in this enum */ /*
add other non-built-in variants if any */)); this preserves "commentary" for
built-in-only requests and prevents the system-message builder from stripping
that channel.
---
Duplicate comments:
In `@crates/mcp/src/transform/transformer.rs`:
- Around line 936-958: The test
test_passthrough_with_generic_action_payload_stays_mcp_call asserts the raw id
"req-1005" but ResponseFormat::Passthrough passes through mcp_response_item_id
normalization, so update the assertion in the match arm to expect the normalized
id returned by mcp_response_item_id instead of "req-1005"; locate the test
function and change the expected value for ResponseOutputItem::McpCall { id, ..
} (the call to ResponseTransformer::transform with ResponseFormat::Passthrough)
to the normalized form produced by mcp_response_item_id.
- Around line 35-37: The current payload detection and mapping
(is_image_payload_candidate and the code that constructs
ImageGenerationCallStatus::Completed) treats any object with a "result" key as a
successful image payload and forces Completed; change the logic to respect a
top-level "status" field: update is_image_payload_candidate to consider objects
that represent an image payload only when "status" == "completed" (or keep
current semantics but add a separate has_image_status check), and in the mapper
that currently sets ImageGenerationCallStatus::Completed (the block that
hardcodes ImageGenerationCallStatus::Completed) branch on the "status" value so
that "failed" maps to ImageGenerationCallStatus::Failed (include the "error"
string or the raw JSON as the failure detail), other/unknown statuses preserve
the raw JSON without forcing Completed; apply the same status-aware check used
for wrapped text handling so the checks at is_image_payload_candidate, the
wrapped-text branch, and the mapping that sets Completed are all consistent.
In `@model_gateway/src/routers/tool_output_context.rs`:
- Around line 11-18: The helper compact_tool_output_for_model_context should
stop inferring success from JSON and instead accept an explicit execution result
flag (e.g., succeeded: bool) and use that flag as the source of truth when
setting status to "completed" vs "failed" (for the ImageGenerationCall branch
that currently calls extract_image_status_and_error, keep extracting the error
payload but ignore its guessed status). Update the signature of
compact_tool_output_for_model_context and all call sites to pass the known
result bit from the callers in
model_gateway/src/routers/grpc/regular/responses/common.rs and
model_gateway/src/routers/openai/mcp/tool_loop.rs, and ensure you still include
any extracted error text/object in the returned context but never derive
success/failure from it.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c61f8cda-de23-40d4-a349-fb0c948b3134
📒 Files selected for processing (8)
crates/mcp/src/transform/transformer.rscrates/protocols/src/responses.rsmodel_gateway/src/routers/grpc/harmony/builder.rsmodel_gateway/src/routers/grpc/regular/responses/common.rsmodel_gateway/src/routers/mcp_utils.rsmodel_gateway/src/routers/mod.rsmodel_gateway/src/routers/openai/mcp/tool_loop.rsmodel_gateway/src/routers/tool_output_context.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 42bd7e2e18
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
crates/mcp/src/transform/transformer.rs (2)
267-286:⚠️ Potential issue | 🟠 MajorError payloads are marked as
Completedinstead ofFailed.The status is unconditionally set to
ImageGenerationCallStatus::Completed(line 274), but the protocol defines aFailedvariant (percrates/protocols/src/responses.rs:432). Payloads containing anerrorfield orisError: trueshould emitImageGenerationCallStatus::Failed. The testtest_image_generation_transform_error_payload(lines 854-879) currently assertsCompletedfor an error payload, reinforcing this gap.This was flagged in a previous review — the current code still does not parse failure payloads to emit
Failedstatus.Suggested fix
fn to_image_generation_call(result: &Value, tool_call_id: &str) -> ResponseOutputItem { let payload = Self::image_payload_from_wrapped_content(result, &ResponseFormat::ImageGenerationCall) .unwrap_or_else(|| result.clone()); let obj = payload.as_object(); - let status = ImageGenerationCallStatus::Completed; + // Check for failure indicators + let is_error = obj + .and_then(|o| o.get("isError")) + .and_then(|v| v.as_bool()) + .unwrap_or(false) + || obj.map_or(false, |o| o.contains_key("error")) + || obj + .and_then(|o| o.get("status")) + .and_then(|v| v.as_str()) + .is_some_and(|s| s.eq_ignore_ascii_case("failed") || s.eq_ignore_ascii_case("error")); + + let status = if is_error { + ImageGenerationCallStatus::Failed + } else { + ImageGenerationCallStatus::Completed + }; + let output_result = payload🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/mcp/src/transform/transformer.rs` around lines 267 - 286, In to_image_generation_call, detect failure payloads instead of always using ImageGenerationCallStatus::Completed: inspect the extracted payload (from image_payload_from_wrapped_content / payload variable) and set status to ImageGenerationCallStatus::Failed when payload contains an "error" field, contains isError: true, or otherwise signals failure; otherwise use Completed. Update the logic that determines status in the ResponseOutputItem::ImageGenerationCall construction (id: format!("ig_{tool_call_id}")), and ensure test test_image_generation_transform_error_payload asserts Failed.
35-37: 🧹 Nitpick | 🔵 Trivial
is_image_payload_candidateonly recognizes success payloads.This helper returns
trueonly when aresultstring key exists. Failure payloads witherrororstatus: "failed"fields (but noresult) will not be recognized as image payloads, causingimage_payload_from_wrapped_contentto returnNoneand fall through to the rawresult.clone()path. Consider also matching onerroror a failure-indicatingstatusfield so error payloads are properly unwrapped before being passed toto_image_generation_call.Suggested helper expansion
- fn is_image_payload_candidate(obj: &serde_json::Map<String, Value>) -> bool { - obj.get("result").and_then(|v| v.as_str()).is_some() + fn is_image_payload_candidate(obj: &serde_json::Map<String, Value>) -> bool { + obj.get("result").and_then(|v| v.as_str()).is_some() + || obj.contains_key("error") + || obj + .get("status") + .and_then(|v| v.as_str()) + .is_some_and(|s| s.eq_ignore_ascii_case("failed") || s.eq_ignore_ascii_case("error")) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/mcp/src/transform/transformer.rs` around lines 35 - 37, The helper is_image_payload_candidate only checks for a "result" string and thus misses wrapped failure payloads; update is_image_payload_candidate to also return true when the map contains an "error" key (string/object) or a "status" key equal to "failed"/"error" so those wrapped error payloads are recognized and unwrapped by image_payload_from_wrapped_content before being passed to to_image_generation_call; locate and modify the function is_image_payload_candidate and ensure its predicates cover "result", "error", and a failing "status" value.model_gateway/src/routers/tool_output_context.rs (1)
36-70: 🧹 Nitpick | 🔵 TrivialConsider checking
isErrorat the top level as well.The current implementation only checks
output.result.isError(the nested JSON-RPC path). Some MCP responses may haveisErrorat the top level when the payload is already unwrapped. Per the previous review discussion, adding a fallback check for top-levelisErrorwould make this more robust.Additionally, the function returns
("completed", None)for any payload that doesn't match theresult.isErrorpath — including payloads with a top-levelstatus: "failed"orerrorfield. This could misclassify failures that don't follow the exact JSON-RPC wrapper format.More robust error detection
fn extract_image_status_and_error(output: &Value) -> (&'static str, Option<String>) { - if output + // Check nested JSON-RPC path: output.result.isError + let nested_is_error = output .as_object() .and_then(|o| o.get("result")) .and_then(|v| v.as_object()) .and_then(|r| r.get("isError")) .and_then(|v| v.as_bool()) - == Some(true) - { + .unwrap_or(false); + + // Check top-level isError (for unwrapped payloads) + let top_level_is_error = output + .as_object() + .and_then(|o| o.get("isError")) + .and_then(|v| v.as_bool()) + .unwrap_or(false); + + if nested_is_error || top_level_is_error { let error_text = output .as_object() .and_then(|obj| obj.get("result"))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/tool_output_context.rs` around lines 36 - 70, Modify extract_image_status_and_error to also inspect top-level fields: check top-level "isError" (bool), "status" (e.g., "failed"), and a top-level "error" or "content" array for human-readable error text if the nested result path is absent; when any of these indicate an error return ("failed", Some(error_text)). Keep the existing nested result.isError logic as primary, but if that path is missing fall back to top-level checks and only return ("completed", None) when neither nested nor top-level indicators show an error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/mcp/src/transform/transformer.rs`:
- Around line 854-879: The test test_image_generation_transform_error_payload
currently asserts ImageGenerationCallStatus::Completed for an error payload;
update the expected status to ImageGenerationCallStatus::Failed (in the match
arm that destructures ResponseOutputItem::ImageGenerationCall) to reflect the
corrected status detection in
to_image_generation_call/ResponseTransformer::transform so the assertion matches
the new failure behavior for {"error":"generation failed"}.
In `@model_gateway/src/routers/tool_output_context.rs`:
- Around line 78-108: Add tests covering the edge cases for
compact_tool_output_for_model_context: (1) create a test similar to
test_compact_image_output_with_jsonrpc_wrapped_text_error that passes a
successful image generation payload (ResponseFormat::ImageGenerationCall) with
no isError and assert parsed.status == "completed" and that there is no "error"
field; (2) add a test that puts isError at the top-level JSON (rather than
inside result) and assert parsed.status == "failed" and the error string is
extracted correctly; (3) add a test for payloads missing both isError and known
error patterns to assert the function returns a neutral/unknown status (e.g.,
not "failed") and handles absence of "error" gracefully; use the same serde_json
parsing/assertion pattern as
test_compact_image_output_with_jsonrpc_wrapped_text_error to locate and validate
fields.
---
Duplicate comments:
In `@crates/mcp/src/transform/transformer.rs`:
- Around line 267-286: In to_image_generation_call, detect failure payloads
instead of always using ImageGenerationCallStatus::Completed: inspect the
extracted payload (from image_payload_from_wrapped_content / payload variable)
and set status to ImageGenerationCallStatus::Failed when payload contains an
"error" field, contains isError: true, or otherwise signals failure; otherwise
use Completed. Update the logic that determines status in the
ResponseOutputItem::ImageGenerationCall construction (id:
format!("ig_{tool_call_id}")), and ensure test
test_image_generation_transform_error_payload asserts Failed.
- Around line 35-37: The helper is_image_payload_candidate only checks for a
"result" string and thus misses wrapped failure payloads; update
is_image_payload_candidate to also return true when the map contains an "error"
key (string/object) or a "status" key equal to "failed"/"error" so those wrapped
error payloads are recognized and unwrapped by
image_payload_from_wrapped_content before being passed to
to_image_generation_call; locate and modify the function
is_image_payload_candidate and ensure its predicates cover "result", "error",
and a failing "status" value.
In `@model_gateway/src/routers/tool_output_context.rs`:
- Around line 36-70: Modify extract_image_status_and_error to also inspect
top-level fields: check top-level "isError" (bool), "status" (e.g., "failed"),
and a top-level "error" or "content" array for human-readable error text if the
nested result path is absent; when any of these indicate an error return
("failed", Some(error_text)). Keep the existing nested result.isError logic as
primary, but if that path is missing fall back to top-level checks and only
return ("completed", None) when neither nested nor top-level indicators show an
error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 84d793a4-e018-4457-84ad-02b5e6cc3925
📒 Files selected for processing (2)
crates/mcp/src/transform/transformer.rsmodel_gateway/src/routers/tool_output_context.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1678672db4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>
efb0d4b to
d0da3a9
Compare
slin1237
left a comment
There was a problem hiding this comment.
Thanks for the work here. The config/enum/streaming plumbing follows the established web_search pattern well. However, there are several design issues that need to be addressed before this can merge.
Major concern: the transformation layer is over-engineered for an unknown MCP response shape.
image_payload_from_wrapped_content has 80+ lines of speculative parsing — trying JSON-RPC wrappers, MCP content arrays, stringified JSON, plain strings, nested result.content objects. Compare this to to_web_search_call which is 15 lines because the MCP contract is well-defined. This complexity suggests we don't have a clear contract with the MCP image generation server yet. We should define the expected MCP response schema first, then write a simple transform against it — not write a catch-all that guesses.
Major concern: mcp.local.yaml committed to repo.
This is a local dev config and should not be in the PR.
Missing: tool options passthrough.
OpenAI's image_generation tool accepts size, quality, background, output_format, action, compression as tool-level options. The ImageGenerationTool struct captures these via #[serde(flatten)] into a HashMap, but they're never passed through to the MCP server as tool call arguments. The MCP server needs these to generate the right image.
Missing: restore_original_tools handling.
The web_search pattern restores the original web_search_preview tool in the response via restore_original_tools() in responses/utils.rs. I don't see the equivalent for image_generation — the response will have the MCP function tool instead of image_generation.
Concern: base64 image data in conversation history.
The compact_tool_output_for_model_context in tool_output_context.rs is called for image generation results. But the result field is a base64 image (potentially megabytes). If this goes into the conversation history that's sent back to the model on multi-turn, it will blow up the context window. OpenAI handles this by referencing previous_response_id or image_generation_call.id — the actual image data is never re-sent to the model.
| pub struct ResponseTransformer; | ||
|
|
||
| impl ResponseTransformer { | ||
| fn is_image_payload_candidate(obj: &serde_json::Map<String, Value>) -> bool { |
There was a problem hiding this comment.
This 80-line image_payload_from_wrapped_content is doing speculative parsing of at least 5 different response shapes:
- Direct object with
resultkey - Array of
{type: "text", text: "..."} - Object with
contentarray - Object with
result.contentarray (JSON-RPC) - Plain JSON string
- Fallback to first non-empty text
Compare this to to_web_search_call which is ~15 lines because we have a defined contract with the MCP web search server.
This level of defensive parsing suggests we haven't defined the MCP image generation server's response contract. Can you:
- Document the expected MCP response schema (what does your MCP server actually return?)
- Write the transform against that specific schema
- Add a single fallback for the unexpected case
The current approach will be a maintenance nightmare — every new MCP server shape that happens to have a result key will match.
There was a problem hiding this comment.
Valid concern. I have narrow down the parse to only support our mcp server case:
"result":{"content":[{"type":"text","text":"{\n \"result\": <base64>
Here is the example of mcp server response:
mcp_generate_image_response.json
And for the fallback case, here is the error format we follow:
There was a problem hiding this comment.
Compressed to 40lines now. Web search has more than 15line. It just seperate the logic to two calls - let sources = Self::extract_web_sources(result);
let queries = Self::extract_queries(result);
| /// open by preserving unknown key/value pairs. | ||
| #[derive(Debug, Clone, Deserialize, Serialize, Default, schemars::JsonSchema)] | ||
| pub struct ImageGenerationTool { | ||
| #[serde(flatten)] |
There was a problem hiding this comment.
ImageGenerationTool uses #[serde(flatten)] pub options: HashMap<String, Value> to capture tool-level options like size, quality, background, etc.
But where are these options forwarded to the MCP server when making the tool call? In the web_search pattern, the tool definition options get threaded through. I don't see the equivalent for image generation — the MCP server will get called without knowing what size/quality/format the user requested.
Also: using a raw HashMap loses all validation. OpenAI specifies valid values for size, quality, background, output_format, action. We should validate the known fields, or at minimum use typed Option<String> fields for the documented ones and #[serde(flatten)] only for future-proofing unknown fields.
There was a problem hiding this comment.
Right now I was passing no override to MCP server and just let mcp generate image with openai default setting, I was planning to add in next PR.
But seems like not a big change. So including the change mentioned in your comment.
We should validate the known fields
Right now I am not sure whether we want to keep these validation at our DP side since we already have all these runtime check. For example -
Or we can move to smg. This need be discussed further, and we need to reach to an agreement across all tools on where we should put the validation check.
or at minimum use typed Option fields for the documented ones and #[serde(flatten)] only for future-proofing unknown fields.
Yes. Agree. Updated. I have made the headers override align with what we have in dp -
Also, right now we did not pass the override to mcp server even for web_search tool(@Tobel158 can confirm here), i have add the logic to pass the override vaule to mcp also - please refer to method apply_request_tool_overrides
There was a problem hiding this comment.
We can use the extra to pass opc-compartment-Id which is OCI specific.
| use serde_json::{json, Value}; | ||
| use smg_mcp::ResponseFormat; | ||
|
|
||
| /// Build tool output text for model context. |
There was a problem hiding this comment.
For image generation, compact_tool_output_for_model_context will receive the full MCP result which includes the base64 image in the result field. Does this function strip the base64 data before storing in conversation history?
If not, multi-turn image editing (which is a key use case per the OpenAI spec) will send megabytes of base64 back to the model on every subsequent turn, blowing up the context window. OpenAI solves this by using previous_response_id or image_generation_call.id references — the image data is stored server-side, never re-sent to the model.
Please confirm this function handles the image case by stripping/replacing the base64 data with a reference or summary.
There was a problem hiding this comment.
Yes. That's what this function for. If you check the line 19-21 in this file. Only these 3 fields are feeded back:
"tool": "generate_image",
"status": status, // Status can be failed or completed depends on the isError field from mcp_server response
"note": error message if isError is true / a preset message if isError is false
And the base64 encoded text is skpped.
| /// Routing information for a built-in tool type. | ||
| /// | ||
| /// When a built-in tool type (web_search_preview, code_interpreter, file_search) | ||
| /// When a built-in tool type (web_search_preview, code_interpreter, image_generation) |
There was a problem hiding this comment.
The doc comment was updated to list image_generation but dropped file_search from the list. Should be all four: web_search_preview, code_interpreter, image_generation, file_search.
There was a problem hiding this comment.
Acutally, file_search is only listed in comments but not really in implemented in this file. That's why I dropped it in the comments.
There was a problem hiding this comment.
I meant to append image generation and don't remove file search
| pub const COMPLETED: &'static str = "response.image_generation_call.completed"; | ||
|
|
||
| pub const fn as_str(self) -> &'static str { | ||
| match self { |
There was a problem hiding this comment.
The GENERATING event (response.image_generation_call.generating) does not appear in the OpenAI streaming spec. OpenAI's image generation streaming goes: in_progress -> partial_image (0-3 times) -> completed.
Adding non-standard SSE events will break clients that validate event types. Can you verify this against the actual OpenAI streaming behavior?
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cb6a3f832c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if !matches!(response_format, ResponseFormat::ImageGenerationCall) { | ||
| return; | ||
| } | ||
|
|
||
| let Some(overrides) = image_generation_request_overrides(original_body) else { |
There was a problem hiding this comment.
Limit image override injection to builtin image calls
apply_request_tool_overrides applies request-level image options to any tool execution whose response_format is ImageGenerationCall. That means a request that includes both a built-in image_generation tool and another MCP tool using the same response format will have the built-in options (size, quality, etc.) injected into the other tool’s arguments, which can cause schema validation failures or unintended behavior for that tool. The override path should be scoped to the specific builtin-routed image tool identity (name/server), not format alone.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
model_gateway/src/routers/grpc/common/responses/utils.rs (1)
41-53: 🧹 Nitpick | 🔵 TrivialUse the shared builtin extractor here.
This block now mirrors
model_gateway/src/routers/mcp_utils.rs::extract_builtin_types()by hand. We already had to update both places forImageGeneration; the next builtin addition can easily drift and silently change MCP-routing behavior on only one path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/common/responses/utils.rs` around lines 41 - 53, Replace the ad-hoc builtin-tool detection with the shared extractor: remove the manual matches that compute has_builtin_tools and instead call the existing extract_builtin_types(...) from model_gateway::routers::mcp_utils (import it), passing the same `tools` value; assign its boolean result to `has_builtin_tools` (preserving the current unwrap_or(false) behavior if needed) so future builtin additions (like ImageGeneration) are handled in one place. Ensure the call matches the extractor's signature and adjust imports/signature if necessary.model_gateway/src/routers/grpc/harmony/builder.rs (1)
425-445:⚠️ Potential issue | 🟠 MajorKeep
commentaryenabled whenimage_generationis the only Responses tool.Line 442 passes
with_custom_toolsintobuild_system_message_from_responses(). After addingResponseTool::ImageGeneration(_)on Line 435, an image-only request now makes that flag false, so the system message strips thecommentarychannel even though this builder emits tool calls oncommentary; Harmony can no longer produce the requested tool call on that path.Suggested fix
- let with_custom_tools = has_custom_tools(&tool_types); + let has_any_tools = !tool_types.is_empty(); + let with_custom_tools = has_custom_tools(&tool_types); // Add system message - let sys_msg = self.build_system_message_from_responses(request, with_custom_tools); + let sys_msg = self.build_system_message_from_responses(request, has_any_tools);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/harmony/builder.rs` around lines 425 - 445, tool detection treats "image_generation" as non-custom so with_custom_tools becomes false and build_system_message_from_responses strips the commentary channel; update the logic that computes with_custom_tools (the call using tool_types and has_custom_tools) to consider ResponseTool::ImageGeneration (or the "image_generation" entry in tool_types) as a custom tool so commentary remains enabled for image-only requests, ensuring build_system_message_from_responses receives true when the only tool is ImageGeneration.crates/protocols/src/event_types.rs (1)
308-350:⚠️ Potential issue | 🟠 MajorUpdate the response_format match and builtin event injection match in the streamer.
The first match block (lines 140–160) defaults non-WebSearch builtins to
ItemType::MCP_CALL, soImageGenerationCall,CodeInterpreterCall, andFileSearchCallare all incorrectly mapped to the MCP handler. The second match block (lines 424–436) is missingImageGenerationCallentirely and will skip event injection for it via the_ => return truefallback.Both blocks must be updated to handle all four builtin types explicitly:
- Lines 140–160: Add cases for
ResponseFormat::CodeInterpreterCall,ResponseFormat::FileSearchCall, andResponseFormat::ImageGenerationCall.- Lines 424–436: Add a case for
ItemType::IMAGE_GENERATION_CALL.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/protocols/src/event_types.rs` around lines 308 - 350, The streamer match blocks are mis-mapping builtin response formats and omitting image generation in builtin event injection; update the response_format -> ItemType mapping so ResponseFormat::CodeInterpreterCall maps to ItemType::CodeInterpreterCall, ResponseFormat::FileSearchCall maps to ItemType::FileSearchCall, and ResponseFormat::ImageGenerationCall maps to ItemType::ImageGenerationCall (in the same match that currently maps WebSearch to ItemType::WebSearchCall and MCP to ItemType::McpCall), and update the builtin event injection match to include ItemType::IMAGE_GENERATION_CALL alongside ItemType::WEB_SEARCH_CALL, ItemType::CODE_INTERPRETER_CALL and ItemType::FILE_SEARCH_CALL so image generation events are not skipped (refer to the ResponseFormat variants and ItemType constants like IMAGE_GENERATION_CALL, CODE_INTERPRETER_CALL, FILE_SEARCH_CALL, and WEB_SEARCH_CALL).model_gateway/src/routers/grpc/common/responses/streaming.rs (1)
133-149:⚠️ Potential issue | 🟠 MajorHandle
image_generation_callseparately instead of writingoutput.Including this new type in the generic branch (line 149) adds an
outputfield to streamedimage_generation_callitems. The OpenAI Responses API specifies thatimage_generation_callobjects use theresultfield for the base64-encoded image content, notoutput. This mismatch breaks the streaming response shape.Diff
let item_type = item_data.get("type").and_then(|t| t.as_str()); let is_tool_call = matches!( item_type, Some("mcp_call") | Some("web_search_call") | Some("code_interpreter_call") | Some("file_search_call") | Some("image_generation_call") ); if is_tool_call && item_data.get("call_id").and_then(|c| c.as_str()) == Some(&tool_result.call_id) { // Add output field let output_str = serde_json::to_string(&tool_result.output) .unwrap_or_else(|_| "{}".to_string()); item_data["output"] = json!(output_str);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/common/responses/streaming.rs` around lines 133 - 149, The current branch treats image_generation_call like other tool calls and writes tool data into item_data["output"], but image_generation_call must use the "result" field with base64 image content; update the logic around item_type / is_tool_call so that when item_type == Some("image_generation_call") and item_data.call_id matches tool_result.call_id you populate item_data["result"] with the base64/stringified image from tool_result.output (or its appropriate field) instead of writing item_data["output"], and keep the existing item_data["output"] behavior for the other tool types (mcp_call, web_search_call, code_interpreter_call, file_search_call) so streaming shape remains correct for image_generation_call vs other tool calls.
♻️ Duplicate comments (2)
model_gateway/src/routers/tool_output_context.rs (1)
36-72:⚠️ Potential issue | 🟠 MajorDon't infer image-tool success from one JSON envelope.
This helper only returns
"failed"whenresult.isError == true; a plain string error or any differently wrapped failure falls through to"completed". Because this summary is fed back into tool-call context, the next turn can be told the image was generated successfully when it was not. Please thread the authoritative execution result into this helper instead of guessing from the payload shape.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/tool_output_context.rs` around lines 36 - 72, The helper extract_image_status_and_error currently infers success solely from the JSON payload shape which can mislabel failed runs as "completed"; change its signature to accept the authoritative execution result (e.g., an ExecutionResult/RunStatus enum or a boolean like execution_failed) alongside the existing output: &Value, then use that authoritative parameter to decide between "failed" and "completed" while still extracting the error text from the payload (preserve the existing extraction logic for error/content). Update all call sites that invoke extract_image_status_and_error to pass the real execution result (the run/response status produced by the image tool) so the helper no longer guesses success from payload shape alone.mcp.local.yaml (1)
1-17:⚠️ Potential issue | 🟡 MinorRemove the local MCP config from this PR.
This is still a developer-specific localhost file and should not ship with the feature change.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcp.local.yaml` around lines 1 - 17, This PR accidentally includes a developer-local config (mcp.local.yaml) containing servers entries (e.g., name: generate_image and name: search_web with builtin_type image_generation and web_search_preview); remove this file from the commit/PR (delete mcp.local.yaml or revert its addition) and, if needed, add an entry to .gitignore to prevent re-adding local MCP configs in the future so only environment-independent configs remain in the repo.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/mcp/src/transform/transformer.rs`:
- Around line 35-114: Add a doc comment to the
image_payload_from_wrapped_content function describing the MCP response shapes
it supports and why _format is unused: list that it accepts (1) a direct object
payload with a "result" string, (2) MCP CallToolResult-style arrays like
[{"type":"text","text":"..."}] where the text may be stringified JSON containing
a "result" object, (3) nested JSON-RPC result.content arrays, (4) a raw JSON
string payload, and (5) a plain text fallback; also note that _format is
reserved for future format-specific gating. Place this comment immediately above
fn image_payload_from_wrapped_content to make the contract clear for reviewers
and maintainers.
---
Outside diff comments:
In `@crates/protocols/src/event_types.rs`:
- Around line 308-350: The streamer match blocks are mis-mapping builtin
response formats and omitting image generation in builtin event injection;
update the response_format -> ItemType mapping so
ResponseFormat::CodeInterpreterCall maps to ItemType::CodeInterpreterCall,
ResponseFormat::FileSearchCall maps to ItemType::FileSearchCall, and
ResponseFormat::ImageGenerationCall maps to ItemType::ImageGenerationCall (in
the same match that currently maps WebSearch to ItemType::WebSearchCall and MCP
to ItemType::McpCall), and update the builtin event injection match to include
ItemType::IMAGE_GENERATION_CALL alongside ItemType::WEB_SEARCH_CALL,
ItemType::CODE_INTERPRETER_CALL and ItemType::FILE_SEARCH_CALL so image
generation events are not skipped (refer to the ResponseFormat variants and
ItemType constants like IMAGE_GENERATION_CALL, CODE_INTERPRETER_CALL,
FILE_SEARCH_CALL, and WEB_SEARCH_CALL).
In `@model_gateway/src/routers/grpc/common/responses/streaming.rs`:
- Around line 133-149: The current branch treats image_generation_call like
other tool calls and writes tool data into item_data["output"], but
image_generation_call must use the "result" field with base64 image content;
update the logic around item_type / is_tool_call so that when item_type ==
Some("image_generation_call") and item_data.call_id matches tool_result.call_id
you populate item_data["result"] with the base64/stringified image from
tool_result.output (or its appropriate field) instead of writing
item_data["output"], and keep the existing item_data["output"] behavior for the
other tool types (mcp_call, web_search_call, code_interpreter_call,
file_search_call) so streaming shape remains correct for image_generation_call
vs other tool calls.
In `@model_gateway/src/routers/grpc/common/responses/utils.rs`:
- Around line 41-53: Replace the ad-hoc builtin-tool detection with the shared
extractor: remove the manual matches that compute has_builtin_tools and instead
call the existing extract_builtin_types(...) from
model_gateway::routers::mcp_utils (import it), passing the same `tools` value;
assign its boolean result to `has_builtin_tools` (preserving the current
unwrap_or(false) behavior if needed) so future builtin additions (like
ImageGeneration) are handled in one place. Ensure the call matches the
extractor's signature and adjust imports/signature if necessary.
In `@model_gateway/src/routers/grpc/harmony/builder.rs`:
- Around line 425-445: tool detection treats "image_generation" as non-custom so
with_custom_tools becomes false and build_system_message_from_responses strips
the commentary channel; update the logic that computes with_custom_tools (the
call using tool_types and has_custom_tools) to consider
ResponseTool::ImageGeneration (or the "image_generation" entry in tool_types) as
a custom tool so commentary remains enabled for image-only requests, ensuring
build_system_message_from_responses receives true when the only tool is
ImageGeneration.
---
Duplicate comments:
In `@mcp.local.yaml`:
- Around line 1-17: This PR accidentally includes a developer-local config
(mcp.local.yaml) containing servers entries (e.g., name: generate_image and
name: search_web with builtin_type image_generation and web_search_preview);
remove this file from the commit/PR (delete mcp.local.yaml or revert its
addition) and, if needed, add an entry to .gitignore to prevent re-adding local
MCP configs in the future so only environment-independent configs remain in the
repo.
In `@model_gateway/src/routers/tool_output_context.rs`:
- Around line 36-72: The helper extract_image_status_and_error currently infers
success solely from the JSON payload shape which can mislabel failed runs as
"completed"; change its signature to accept the authoritative execution result
(e.g., an ExecutionResult/RunStatus enum or a boolean like execution_failed)
alongside the existing output: &Value, then use that authoritative parameter to
decide between "failed" and "completed" while still extracting the error text
from the payload (preserve the existing extraction logic for error/content).
Update all call sites that invoke extract_image_status_and_error to pass the
real execution result (the run/response status produced by the image tool) so
the helper no longer guesses success from payload shape alone.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: cb09d0a1-8926-42a1-987c-9e05e3bd0661
📒 Files selected for processing (17)
.gitignorecrates/mcp/src/core/config.rscrates/mcp/src/transform/transformer.rscrates/mcp/src/transform/types.rscrates/protocols/src/event_types.rscrates/protocols/src/responses.rsmcp.local.yamlmodel_gateway/src/routers/grpc/common/responses/streaming.rsmodel_gateway/src/routers/grpc/common/responses/utils.rsmodel_gateway/src/routers/grpc/harmony/builder.rsmodel_gateway/src/routers/grpc/regular/responses/common.rsmodel_gateway/src/routers/mcp_utils.rsmodel_gateway/src/routers/mod.rsmodel_gateway/src/routers/openai/mcp/tool_loop.rsmodel_gateway/src/routers/openai/responses/streaming.rsmodel_gateway/src/routers/openai/responses/utils.rsmodel_gateway/src/routers/tool_output_context.rs
cb6a3f8 to
0cf8cc9
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0cf8cc9179
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
0cf8cc9 to
5749506
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 57495062be
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
5749506 to
cebd5bc
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cebd5bc87c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ResponseTool::WebSearchPreview(_) | ResponseTool::CodeInterpreter(_) | ||
| ResponseTool::WebSearchPreview(_) | ||
| | ResponseTool::CodeInterpreter(_) | ||
| | ResponseTool::ImageGeneration(_) |
There was a problem hiding this comment.
Apply image tool overrides in gRPC MCP execution
By including ResponseTool::ImageGeneration in MCP-routing detection here, gRPC /responses requests with image tools can now be intercepted and executed via MCP, but the gRPC execution paths still pass only model-emitted function-call arguments to execute_tool (no merge of request-level tool options such as size, quality, or background). In practice, explicit request options can be dropped unless the model repeats them, so image behavior diverges from the OpenAI router path that now calls apply_request_tool_overrides before execution.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Valid point. I will change in a new pr. I don want to keep adding scope to this pr
cebd5bc to
74c5cf6
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 74c5cf6ec7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| output: result | ||
| .as_str() | ||
| .map(str::to_string) | ||
| .unwrap_or_else(|| result.to_string()), |
There was a problem hiding this comment.
Restore text flattening for passthrough MCP outputs
This mcp_call path now returns result.to_string() for any non-string tool output, which regresses common MCP responses where tools/call returns content arrays like [{"type":"text","text":"..."}]: clients will receive the raw JSON wrapper instead of the actual text payload. Because passthrough tools use this branch broadly, normal successful tool calls can become harder to consume and differ from previous behavior that extracted text blocks; this is user-visible for any MCP server returning structured content.
Useful? React with 👍 / 👎.
74c5cf6 to
d7799f4
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d7799f40a9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
d7799f4 to
b92ca7e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b92ca7e84b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if output | ||
| .as_object() | ||
| .and_then(|o| o.get("result")) | ||
| .and_then(|v| v.as_object()) | ||
| .and_then(|r| r.get("isError")) |
There was a problem hiding this comment.
Handle array-shaped MCP outputs when deriving image status
extract_image_status_and_error only inspects output.result.isError, but MCP tool execution in this stack commonly yields array-shaped payloads (fresh evidence: McpOrchestrator::call_result_to_json serializes CallToolResult.content directly), so this predicate is never reached for normal image tool outputs. In those cases compact_tool_output_for_model_context always records "status":"completed" and drops error text, which feeds incorrect success state into tool-loop conversation history and can suppress retries/failure handling after image generation errors.
Useful? React with 👍 / 👎.
b92ca7e to
808cf73
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 808cf73621
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
808cf73 to
65f9747
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 65f974754e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ) -> String { | ||
| match response_format { | ||
| ResponseFormat::ImageGenerationCall => { | ||
| let is_error = is_image_generation_error(output); |
There was a problem hiding this comment.
Guard image context compaction against non-wrapper outputs
compact_tool_output_for_model_context calls is_image_generation_error(output) unconditionally, but that helper panics unless output.result.isError is present. In the MCP execution path, image tool outputs are commonly array/string payloads (serialized CallToolResult.content), so this can panic during normal tool-loop history compaction instead of returning a summary, aborting the request path after tool execution.
Useful? React with 👍 / 👎.
| ResponseFormat::ImageGenerationCall => { | ||
| (ItemType::IMAGE_GENERATION_CALL, "ig_") | ||
| } |
There was a problem hiding this comment.
Keep image argument events on image item IDs
This new image_generation_call rewrite assigns ig_ item IDs for streamed tool items, but argument events are still normalized through the MCP path (mcp_call.arguments_* with mcp_ IDs). For intercepted image tool calls, clients can receive response.output_item.added for ig_* while argument events reference mcp_*, breaking per-item event correlation in streaming consumers.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Streaming test/change is not scoped in this PR. I already found bunch of issues for streaming part. I need to do a thouroug test for that.
Note down this comment but not plan to add any change,
65f9747 to
14ca8d1
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 14ca8d10f6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>
14ca8d1 to
9106f1d
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9106f1d630
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 21b4352954
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| /// single fallback path. | ||
| fn image_payload_from_wrapped_content(result: &Value) -> Option<Value> { | ||
| // Parse the JSON-RPC wrapper object under top-level `result`. | ||
| let result_obj = result.get("result").and_then(|v| v.as_object())?; |
There was a problem hiding this comment.
Parse SSE-wrapped image errors before returning fallback output
image_payload_from_wrapped_content immediately assumes the tool output is an object with a top-level result, so SSE-style string payloads (e.g. event: message\ndata: {..."isError":true...}) always return None here and fall through to the generic string fallback. In that case image_generation_call.result becomes the raw SSE envelope instead of the actual error text, which degrades client-visible error handling for streamable MCP image servers.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
streaming events will be handle in seperated pr. Not scoped to this pr
slin1237
left a comment
There was a problem hiding this comment.
Thanks for the updates. A few items from my original review are still open — want to make sure they don't get lost:
1. restore_original_tools for image_generation (no response yet)
The web_search pattern restores the original web_search_preview tool definition in the response payload via restore_original_tools() in model_gateway/src/routers/openai/responses/utils.rs. Without the equivalent for image_generation, the client's response will contain the MCP function tool definition instead of {"type": "image_generation"}. This breaks clients that check the response's tools array to know which built-in tools were active.
2. response.image_generation_call.generating SSE event (no response yet)
This event doesn't exist in the OpenAI streaming spec. OpenAI's image generation streaming goes in_progress → partial_image (0-3 times) → completed. Adding a non-standard event type will break clients that validate SSE event names. Can you verify this against the actual OpenAI behavior, or remove it and use an existing event?
3. Tool options deferral — needs documentation or validation
Understood that size, quality, background, output_format passthrough is deferred to a follow-up PR. However, right now a client can send {"type": "image_generation", "size": "1024x1536", "quality": "high"} and those options are silently ignored — the image will be generated with whatever the MCP server defaults to. This is a silent contract violation from the client's perspective.
At minimum, can we either:
- (a) Add a log warning when tool options are present but ignored, or
- (b) Document this limitation in a code comment at the
sanitize_builtin_tool_argumentscall site, or - (c) Link the follow-up issue/PR number so we can track it
4. isError flag not checked in extract_image_status_and_error
+1 to CodeRabbit's comment on this. The MCP protocol's isError: true is the authoritative signal for a failed tool call. The current code only pattern-matches on "Error executing tool" text prefix, which misses any MCP server that returns a different error message format with isError: true. This is a one-line fix.
5. Doc comment in mcp_utils.rs
Minor: the updated doc comment lists web_search_preview, code_interpreter, image_generation but dropped file_search. Should list all four.
|
…sing Signed-off-by: TingtingZhou7 <zhoutt96@gmail.com>
00183d7 to
2f0de11
Compare

Description
Support image generation tool mcp server from smg for openSource customers.
Problem
No image generation tool support today via mcp server.
Solution
Add support to:
Two layer of ImageGenerationTool is supported in this PR:
Changes not included in this PR
Test Plan
Test Case 1: Call to Remote MCP Server to generate image
Step1: I set up the OCI Image Tool Generation tool MCP Server locally at port 8080. And run the cargo command as below with --mcp-config-path /Users/tingzhou/workspace/genAI/smg/mcp.local.yaml added. This will allow smg to read the mcp server information dynamically:
which include the tool and mcp server information. Here is the example of the mcp.local.yaml file:

Test with streaming disabled
Request
Success log from local mcp server:

Image attached:
Respons json:
response-new1.json
Test with streaming enabled
response-new-stream.json
Test Case 2
Direct call to OpenAI to generate image
Step1: I run this command to start the smg locally without pass the mcp server configuration:
Step1: Run image generation tool end-to-end
API call:
And here is the output:
openai-response.txt
Checklist
cargo +nightly fmtpassescargo clippy --all-targets --all-features -- -D warningspassesSummary by CodeRabbit