Skip to content

feat: add browser_click_at tool for coordinate-based mouse clicks#12

Open
grigoryosifov wants to merge 1 commit into
pm990320:mainfrom
grigoryosifov:feat/browser-click-at
Open

feat: add browser_click_at tool for coordinate-based mouse clicks#12
grigoryosifov wants to merge 1 commit into
pm990320:mainfrom
grigoryosifov:feat/browser-click-at

Conversation

@grigoryosifov
Copy link
Copy Markdown

Problem

browser_click requires an accessibility-snapshot ref; browser_evaluate with element.click() can hit synthetic-event blocks (notably cross-origin iframes and some pages where the submit button silently ignores JS-dispatched clicks). That leaves no fallback when a target is visible on screen but unreachable via refs or JS — canvas-rendered controls, invisible overlays stealing clicks, pointer-events traps, etc.

Fix

New browser_click_at(x, y) tool that calls Playwright's page.mouse.click(x, y) (or dblclick) directly. Coordinates come from browser_screenshot. Supports button (left/right/middle), doubleClick, and targetId like browser_click. Validates that x and y are finite numbers before dispatching.

Tested

  • npx vitest run — 30/30 passing (3 new unit tests cover input validation for NaN, Infinity, undefined)
  • Live smoke test against Chrome Agents (CDP on :14272):
    • Opened a data URL with a button at x=100–220, y=100–160
    • Called browser_click_at({x: 160, y: 130})
    • Button click handler fired (text changed to CLICKED) and page-level event listener captured (160, 130) — confirming a real Playwright mouse event, not a synthetic DOM click

Notes

Cross-origin iframes still block mouse events at the CDP level (same as browser_evaluate with .click()). Keyboard navigation via browser_press_key remains the escape hatch for those.

Adds a last-resort click tool that takes absolute (x, y) coordinates
and calls page.mouse.click/dblclick directly. Complements browser_click
(ref-based) and browser_evaluate (JS-based) for cases where both fail:
canvas-rendered UI, invisible overlays, pointer-events traps, or any
element not surfaced in the accessibility snapshot.

Coordinates come from browser_screenshot. Same cross-origin-iframe
limitation as the other mouse-event paths applies - keyboard navigation
via browser_press_key remains the iframe escape hatch.

Verified end-to-end against a real Chrome Agents instance: button click
fired at target coordinates and window event listener captured (160, 130).
@grigoryosifov grigoryosifov force-pushed the feat/browser-click-at branch from 18eae31 to 2ed255b Compare April 21, 2026 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant