Skip to content

feat(mcp): include resolved element bounds in tap_on and run_flow responses#3123

Open
sinano1107 wants to merge 8 commits intomobile-dev-inc:mainfrom
sinano1107:feat/mcp-tap-bounds
Open

feat(mcp): include resolved element bounds in tap_on and run_flow responses#3123
sinano1107 wants to merge 8 commits intomobile-dev-inc:mainfrom
sinano1107:feat/mcp-tap-bounds

Conversation

@sinano1107
Copy link
Copy Markdown

Summary

  • Add CommandResult and FlowResult data classes to Orchestra so that element metadata (bounds, center) captured during command execution is propagated back to MCP tool callers.
  • Orchestra.runFlow() now returns FlowResult (success + commandResults) instead of a bare Boolean. Internal command handlers are unchanged; an accumulator captures element info from tapOnElement and swipeCommand without touching the 40+ other handlers.
  • tap_on response now includes "bounds": [x1,y1,x2,y2] and "center": [cx,cy]
  • run_flow response now includes a "command_results" array with per-command bounds / start_point / end_point when available

Motivation

When building automation that records tap/swipe coordinates (e.g. for video overlay generation), the resolved element bounds are needed but currently discarded inside Orchestra. This change surfaces them through the existing MCP tool responses without altering Orchestra's internal command execution logic.

Changes

File Change
Orchestra.kt CommandResult/FlowResult data classes, _commandResults accumulator, runFlow() returns FlowResult
CommandResultSerializer.kt Shared addBoundsTo() extension for JSON serialization
TapOnTool.kt Include bounds/center in response
RunFlowTool.kt Include command_results array in response
RunFlowFilesTool.kt Use flowResult.success
MaestroCommandRunner.kt .success accessor (backward-compatible)
TestSuiteInteractor.kt .success accessor (backward-compatible)

Test plan

  • maestro-orchestra:test passes
  • maestro-cli:test passes
  • Manual test: tap_on returns correct bounds/center on iOS simulator (Settings app)
  • Manual test: run_flow with tap + swipe returns command_results with tap bounds

🤖 Generated with Claude Code

sinano1107 and others added 3 commits April 3, 2026 14:33
…ponses

Add CommandResult and FlowResult data classes to Orchestra so that
element metadata (bounds, center) captured during command execution is
propagated back to MCP tool callers.

Changes:
- Orchestra.runFlow() now returns FlowResult (success + commandResults)
  instead of a bare Boolean.  Internal command handlers are unchanged;
  an accumulator (_commandResults) captures element info from
  tapOnElement and swipeCommand without touching the 40+ other handlers.
- TapOnTool: response includes "bounds" [x1,y1,x2,y2] and "center" [cx,cy].
- RunFlowTool: response includes "command_results" array with per-command
  bounds / start_point / end_point when available.
- RunFlowFilesTool, MaestroCommandRunner, TestSuiteInteractor: updated
  to use FlowResult.success (backward-compatible one-liner changes).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace CommandResult.element (UiElement) with CommandResult.bounds
  (Bounds) since callers only access bounds
- Extract bounds/center/start_point/end_point JSON serialization into
  CommandResultSerializer.kt (addBoundsTo extension function)
- Fix TapOnTool hardcoded success=true to use flowResult.success
- Skip accumulating empty CommandResult for direction-only and
  relative-coordinate swipes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
runFlow() now returns FlowResult instead of Boolean.
Update assertions to use .success property.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sinano1107
Copy link
Copy Markdown
Author

Code review

Found 1 issue:

  1. The message field in RunFlowTool and RunFlowFilesTool is hardcoded to "Flow executed successfully" even when flowResult.success is false. This PR correctly wires up flowResult.success for the success field, but leaves the message static — so an MCP consumer will receive a contradictory response (success: false, message: "Flow executed successfully") whenever a flow fails due to a failing assertion or command error.

https://github.com/mobile-dev-inc/maestro/blob/0360f7e864833cc530d2dd4a1e240ea7e0a40422/maestro-cli/src/main/java/maestro/cli/mcp/tools/RunFlowTool.kt#L120-L126

Same issue in RunFlowFilesTool:

https://github.com/mobile-dev-inc/maestro/blob/0360f7e864833cc530d2dd4a1e240ea7e0a40422/maestro-cli/src/main/java/maestro/cli/mcp/tools/RunFlowFilesTool.kt#L102-L108

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@sinano1107
Copy link
Copy Markdown
Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- Use Bounds.center() in CommandResultSerializer instead of manual
  center calculation, for consistency with the rest of the codebase
- Remove unnecessary same-package imports of addBoundsTo in
  TapOnTool.kt and RunFlowTool.kt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Fishbowler
Copy link
Copy Markdown
Contributor

Can you elaborate on the value and motivation for this change?

@sinano1107
Copy link
Copy Markdown
Author

I'm building a system that records iOS Simulator interactions (via Maestro MCP) and composites tap/swipe indicators onto the recorded video using Remotion. To place the indicators accurately, I need the resolved element coordinates from each tap_on and swipe command.

Currently, Maestro resolves the element internally but discards the coordinates before returning the MCP response. The only alternative is calling inspect_view_hierarchy before each tap and re-matching the element, but this introduces race conditions (UI may change between hierarchy fetch and tap) and doubles the session overhead per interaction.

This change surfaces coordinates that Maestro already computes, with no impact on existing behavior.

@Fishbowler
Copy link
Copy Markdown
Contributor

Oooh, neat idea ♥️

I think there might be other applications folks find for this too.

Just need to check that this doesn't "cross-pollenate" too heavily. Will review properly once I'm back at my desk on Tuesday 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants