Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,13 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).
forces native Wayland with GPU compositing enabled and skips forced renderer
accessibility by default for Wayland desktops where XWayland or software
rendering is unstable.
- New opt-in Linux feature `appshots` uses the bundled Computer Use backend to
attach the focused Linux window with metadata, AT-SPI text, and a screenshot.
The feature exposes the upstream AppShots composer control on Linux and routes
capture updates through the same renderer event contract as macOS.
- Linux screenshot capture now falls back to common desktop tools (`grim`,
`gnome-screenshot`, `spectacle`, ImageMagick `import`) when GNOME Shell DBus
and XDG Desktop Portal capture are unavailable.
- New opt-in Linux feature `read-aloud-mcp` that stages a standalone Rust Read
Aloud MCP plugin with `doctor`, `read_aloud`, and `stop` tools. The MCP server
reuses the Kokoro runner/model configuration from the Read Aloud UI feature
Expand Down
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Anything systemd-based should work for the optional auto-updater service (`syste
| Multi-instance launcher | 🧪 opt-in | `--new-instance` or `CODEX_MULTI_LAUNCH=1` allocates a bounded webview port and isolated Electron profile |
| GUI install prompts (`kdialog` / `zenity`) | ✅ if installed | Falls back to interactive terminal prompt |
| Linux browser annotations | ✅ always | Stored-anchor screenshots, isolated marker rendering |
| Linux AppShots | 🧪 opt-in experiment | `linux-features/appshots` exposes the upstream AppShots composer control on Linux and attaches the focused window screenshot plus AT-SPI text through the bundled Computer Use backend |
| Chrome plugin native host | ✅ always | Auto-installs the upstream Chrome plugin plus Linux native-messaging support for Chrome, Brave, and Chromium |
| Linux Computer Use | ⚠️ opt-in | MCP backend registers by default; the in-app UI is opt-in. Supports screenshots, accessibility, window targeting, and input synthesis |
| Linux Read Aloud | 🧪 opt-in experiment | `linux-features/read-aloud` adds an explicit response speaker button; `linux-features/read-aloud-mcp` stages a separate MCP plugin so the agent can read text aloud on request |
Expand Down Expand Up @@ -210,7 +211,7 @@ The scheduled `Populate Cachix` workflow builds the default Codex Desktop packag
Linux Computer Use is an **opt-in** plugin that lets Codex inspect and control desktop apps on Linux through a native Rust MCP backend (`codex-computer-use-linux`). It is designed and maintained by [@avifenesh](https://github.com/avifenesh) and supports:

- app listing and accessibility trees via AT-SPI
- screenshots through GNOME Shell DBus or XDG Desktop Portal
- screenshots through GNOME Shell DBus, XDG Desktop Portal, or CLI fallbacks such as `grim`, `gnome-screenshot`, `spectacle`, and ImageMagick `import`
- window listing and focusing on GNOME, KWin/Plasma, Hyprland, and i3
- keyboard, text, click, scroll, and drag input through a uinput absolute pointer, the XDG Desktop Portal RemoteDesktop session, or `ydotool`

Expand Down Expand Up @@ -263,9 +264,16 @@ You can also invoke the backend binary directly:
./codex-app/resources/plugins/openai-bundled/plugins/computer-use/bin/codex-computer-use-linux setup # enables GNOME accessibility
./codex-app/resources/plugins/openai-bundled/plugins/computer-use/bin/codex-computer-use-linux apps # lists running apps via AT-SPI
./codex-app/resources/plugins/openai-bundled/plugins/computer-use/bin/codex-computer-use-linux windows # lists targetable windows
./codex-app/resources/plugins/openai-bundled/plugins/computer-use/bin/codex-computer-use-linux focused-window
./codex-app/resources/plugins/openai-bundled/plugins/computer-use/bin/codex-computer-use-linux screenshot
./codex-app/resources/plugins/openai-bundled/plugins/computer-use/bin/codex-computer-use-linux appshot [APP_NAME|pid:PID]
```

For the full AppShots UI path, enable `linux-features/appshots` before building.
The feature exposes the upstream AppShots composer control on Linux. Global
hotkeys are disabled by default; after opting in, configure one from the
AppShots settings page.

### Enabling Computer Use UI

By default the MCP backend registers, but the Codex Desktop sidebar does not surface the Computer Use controls. If you want to use it through the in-app UI, opt in by setting one of:
Expand Down
Loading
Loading