gui-user

An MCP server for external computer-use. Launch, observe, and interact with any X11 application via AT-SPI2 accessibility tree and xdotool input injection.

Unlike in-process testing frameworks, gui-user works externally — it can drive compiled C++ Qt/QML apps, GTK apps, Electron apps, or anything that renders on X11.

Installation

1. System packages

# Debian/Ubuntu — required
sudo apt install xvfb xdotool at-spi2-core dbus imagemagick libgirepository1.0-dev

# Optional — for VNC observation of the headless display
sudo apt install x11vnc tigervnc-viewer

# Optional — for OCR-based text detection in screenshots
sudo apt install tesseract-ocr

2. Install gui-user

Clone the repo and install in development mode:

git clone <repo-url> gui-user
cd gui-user
pip install -e .

This puts gui-user-mcp on your $PATH as the MCP server entry point.

3. Configure Claude Code

Add gui-user as a user-scope MCP server (available in all projects):

claude mcp add gui-user -s user -- gui-user-mcp

Or for a single project only, run from the project directory:

claude mcp add gui-user -- gui-user-mcp

Alternatively, you can create .mcp.json in the project root (this is shared via source control):

{
  "mcpServers": {
    "gui-user": {
      "command": "gui-user-mcp"
    }
  }
}

Verify the server is connected:

claude mcp list

If using VS Code, reload the window (Ctrl+Shift+P → "Developer: Reload Window") after adding the server, then start a new conversation. Type /mcp in the chat panel to confirm gui-user appears.

Tools

Tool	Description
`launch_app(binary, args, env, working_dir, width, height, timeout, display_mode, display, vnc)`	Launch any binary under an isolated Xvfb display or a visible local X11 display
`close_app()`	Close the app (display session stays alive for reuse)
`stop_display()`	Tear down the display session (Xvfb, D-Bus, VNC)
`get_app_status()`	Check if app is running, get PID/exit code/stderr
`screenshot(output_path?)`	Capture screen as base64 PNG
`list_ui_elements(role?, name?, visible_only?)`	Enumerate AT-SPI accessibility tree
`find_element(text?, role?, index?)`	Find element by label/role
`get_element_info(text?, role?, at_x?, at_y?)`	Detailed element properties or coordinate lookup
`click(x, y, button?)`	Click at screen coordinates
`click_element(text?, role?, index?, button?)`	Find element and click its center
`double_click(x, y, button?)`	Double-click at coordinates
`double_click_element(text?, role?, index?, button?)`	Find element and double-click
`hover(x, y)`	Move mouse to coordinates
`hover_element(text?, role?, index?)`	Move mouse to element center
`type_text(text)`	Type text into focused widget
`press_key(key, modifiers?)`	Key press (e.g., `press_key("s", ["Ctrl"])`)
`wait_for_idle(timeout?)`	Wait for CPU usage to settle
`wait_for_element(text?, role?, timeout?)`	Poll until element appears
`batch_actions(actions)`	Execute a sequence of actions in one call (avoids per-action round-trips)

Example Workflow

# Launch any binary in the default isolated Xvfb session
launch_app(binary="/usr/bin/gnome-calculator")

# Launch on the operator's visible X11 desktop instead
launch_app(
    binary="/usr/bin/gnome-calculator",
    display_mode="local",
)

# Or target a specific local display explicitly
launch_app(
    binary="/usr/bin/gnome-calculator",
    display_mode="local",
    display=":1",
)

# Discover UI elements
list_ui_elements()

# Find and click a button by its visible label
click_element(text="7", role="button")
click_element(text="+", role="button")
click_element(text="3", role="button")
click_element(text="=", role="button")

# Type text
type_text(text="hello world")

# Keyboard shortcuts
press_key(key="s", modifiers=["Ctrl"])

# Screenshot
screenshot(output_path="/tmp/result.png")

# Clean up
close_app()

Architecture

AI Assistant (Claude)
    │ MCP Protocol (stdio)
    ▼
MCP Server (main.py)
    │ Orchestrates:
    ├── DisplayManager  (Xvfb/local X11 + D-Bus + AT-SPI)
    ├── ProcessManager  (binary launch/monitor)
    ├── AccessibilityTree (AT-SPI2 element discovery)
    ├── ScreenshotCapture (ImageMagick import)
    ├── InputController (xdotool mouse/keyboard)
    └── IdleWaiter      (CPU-based idle detection)
    │
    ▼
Target Application (any X11 binary)

Key Differences from qt-pilot

This project was forked from qt-pilot and redesigned:

	qt-pilot	gui-user
Target apps	Python/PySide6 only	Any X11 binary
Discovery	`objectName` (requires code changes)	AT-SPI accessibility tree (no code changes)
Interaction	In-process QTest	External xdotool
Architecture	Monkeypatch + socket IPC	External observation + input injection

Running Tests

python3 -m unittest tests.test_integration tests.test_local_display -v

Observing the Headless Display (VNC)

Pass vnc=True to launch_app to start a view-only VNC server alongside the Xvfb display. This lets the operator watch what the AI is doing without interfering.

launch_app(binary="my_app", vnc=True)
# Response includes: "vnc_display": "localhost:5900"

To connect, run from any terminal:

gui-user-view

This auto-detects the running x11vnc and opens a VNC viewer. If x11vnc isn't running yet, it starts one on the first Xvfb display it finds. You can also pass a specific port: gui-user-view 5902

To connect manually: vncviewer localhost:<port>

Requirements: sudo apt install x11vnc tigervnc-viewer

Helper Commands

These are installed on your $PATH by pip install:

Command	Description
`gui-user-view`	Auto-detect the running Xvfb display and open a VNC viewer. Starts x11vnc if needed.
`gui-user-stop`	Kill any lingering Xvfb, x11vnc, and at-spi2-registryd processes. Useful for cleanup after crashes or interrupted sessions.

The underlying shell scripts (view-display.sh, stop-display.sh) are also available in the repo root.

Display Lifecycle

The display session (Xvfb + D-Bus + VNC) persists across app restarts. This means:

launch_app() creates the display on first call, reuses it on subsequent calls
close_app() terminates only the app — the display and VNC stay alive
stop_display() tears down everything (Xvfb, D-Bus, VNC)

This lets the operator connect the VNC viewer once and watch across multiple app launch/close cycles.

Screenshot Gallery

Every screenshot() call auto-saves a timestamped PNG to .gui-user/screenshots/ in the current working directory. Browse this folder to review the full visual history of a session.

Local Display Mode

display_mode="local" reuses a real X11 display so the operator can watch the app while the MCP drives it.

This mode is opt-in. The default remains an isolated Xvfb session.
Local mode is intended for X11 or XWayland displays only.
Mouse, keyboard, and focus are shared with the operator, so runs are less deterministic.
width and height are ignored in local mode because the existing desktop geometry is reused.
For unattended or CI-style runs, prefer the default Xvfb mode.

License

MIT License - see LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.claude-plugin		.claude-plugin
server		server
tests		tests
.gitignore		.gitignore
.mcp.json		.mcp.json
LICENSE		LICENSE
PLAN-external-computer-use.md		PLAN-external-computer-use.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
spike_atspi.py		spike_atspi.py
spike_atspi2.py		spike_atspi2.py
spike_atspi3.py		spike_atspi3.py
stop-display.sh		stop-display.sh
view-display.sh		view-display.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gui-user

Installation

1. System packages

2. Install gui-user

3. Configure Claude Code

Tools

Example Workflow

Architecture

Key Differences from qt-pilot

Running Tests

Observing the Headless Display (VNC)

Helper Commands

Display Lifecycle

Screenshot Gallery

Local Display Mode

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gui-user

Installation

1. System packages

2. Install gui-user

3. Configure Claude Code

Tools

Example Workflow

Architecture

Key Differences from qt-pilot

Running Tests

Observing the Headless Display (VNC)

Helper Commands

Display Lifecycle

Screenshot Gallery

Local Display Mode

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages