Playwright for desktop apps, built for coding agents.
macOS shipped · Linux + Windows coming.
A fast CLI/MCP control layer that lets agents like Claude, Codex, opencode, and Gemini inspect and operate desktop apps through structured UI data, then save successful flows as replayable tests.
MVP. macOS only. Accessibility tree first, screenshots as fallback.
Agents shouldn't drive desktop apps by guessing pixels. guiport exposes the desktop as structured data: app/window list, focused app/window, accessibility tree, element role/name/value/state/bounds/actions, screenshots only when needed, deterministic replay scripts after exploration.
macOS 13+. See INSTALL.md for full options + platform status.
# Homebrew (once tap is published)
brew tap edihasaj/guiport && brew install guiport
# Or install script
curl -fsSL https://raw.githubusercontent.com/edihasaj/guiport/main/scripts/install.sh | sh
# Or from source
swift build -c release && sudo cp .build/release/guiport /usr/local/bin/guiportLinux + Windows are roadmap — see INSTALL.md.
guiport doctor # check permissions
guiport apps --json # list running apps with windows
guiport observe --app "Safari" # focused window summary
guiport tree --app "Safari" --json # full accessibility tree
guiport find --app "Safari" 'button[name="Save"]' # match selector
guiport click --app "Safari" 'button[name="Save"]'
guiport type "hello"
guiport screenshot --app "Safari" -o safari.png
# Vision fallback for canvas / sparse-AX apps:
guiport find-text --app "Figma" "Save" # OCR via Apple Vision
guiport click-text --app "Figma" "Save" # OCR + click center
guiport click-at 420 180 # raw coordinates
guiport record smoke.yaml # WIP
guiport run smoke.yaml
guiport serve --mcp # MCP server over stdiorole[attr=value][attr~=substring][index]
Examples:
button[name="Save"]
textfield[identifier="search"]
AXButton[name~="Open"][index=0]
Supported attributes: role, name (title), value, identifier, description, text (matches name or value), index.
For apps with sparse or absent accessibility (Figma, custom-rendered editors, hardened Electron), guiport falls back through three layers:
click-at X Y— raw screen coordinates. The agent reads coords off a screenshot.find-text "Save"/click-text "Save"— Apple Vision (VNRecognizeTextRequest) OCRs the window and returns bounds + center for matched text. On-device, free, no extra deps.- LLM vision — out of scope for MVP; agents can call
screenshot+ their own model to get coords, thenclick-at.
OCR-found bounds drift across font/scale changes, so prefer AX selectors for replay and OCR for exploration.
guiport needs:
- Accessibility — required for AX tree + input events.
- Screen Recording — required for
screenshotand screenshot-on-failure artifacts.
Run guiport doctor to check status and get System Settings deep links.
- Pure Swift, single binary.
GuiportCorelibrary: AX bridge, selector engine, input, screenshots, replay runner, MCP server.guiportCLI: thin wrapper using swift-argument-parser.
- No Windows/Linux yet.
- No vision-first automation.
- No autonomous Manus clone.
- No background/session-0 automation.
MIT — see LICENSE.