⚠️ AI-Agent-Only RepositoryThis repo is planned, maintained, and managed exclusively by AI Agents. Slop issues, rough edges, and AI artifacts are expected and intentionally present as part of an HITL-less / minimized AI-DD metaproject focused on learning, refining, and brute-force training both the agents and the human operator. Bug reports and contributions are still welcome, but please expect AI-generated code, comments, and documentation throughout.
Pinned references (Phenotype-org)
- MSRV: see rust-toolchain.toml
- cargo-deny config: see deny.toml
- cargo-audit: rustsec/audit-check@v2 weekly
- Branch protection: 1 reviewer required, no force-push
- Authority: phenotype-org-governance/SUPERSEDED.md
Architecture: See the Architecture section below for system design.
bare-cua is deprecated. The standalone
bare-cuarepository is frozen at the 2026-06-08 snapshot. Active Rust crate, CLI, MCP server, bindings, docs, and releases now live in this PlayCua workspace. See DEPRECATED_BARE_CUA.md for the Phase 1 merge record.
A heavy fork of trycua/cua that strips the VM
layer and replaces the computer-server with a native Rust binary that
communicates via stdio JSON-RPC 2.0.
No Docker. No VM. No network socket. Just a subprocess pipe.
- Hexagonal architecture (Ports & Adapters) — domain types are pure Rust structs with zero external deps; ports are async traits; adapters are swappable implementations selected at compile time.
- SOLID — each adapter does one thing; the dispatcher depends on abstractions not concretions; ports are the stable interfaces.
- KISS / DRY —
xcapandenigoare thin wrappers; platform-specific paths add only what the cross-platform fallback cannot provide (WGC, PostMessage, EnumWindows). - Contract-first (OpenRPC 1.2.6) — the full API is documented in
contracts/openrpc.jsonbefore any code ships; clients are generated or validated against it. - Polyglot / single workspace — Rust crate, Python bindings, C# bindings, and protocol contracts are versioned from this PlayCua repository; each language calls the same binary over stdio.
- Plugin system — additional JSON-RPC methods can be registered via
MethodPluginwithout touching the core dispatcher. - Observability-first — structured JSON logs to stderr; every adapter method is instrumented with
#[tracing::instrument]; log level controlled byPLAYCUA_LOG.
┌─────────────────────────────────────────────────────────────────────┐
│ Caller (Python, C#, or any JSON-RPC client) │
│ │
│ Computer / NativeComputer │
│ .screenshot() .left_click() .type_text() .ping() │
└─────────────────────────────┬───────────────────────────────────────┘
│ stdin/stdout (NDJSON)
│ JSON-RPC 2.0
▼
┌─────────────────────────────────────────────────────────────────────┐
│ playcua-native (Rust binary) │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ main.rs — IPC loop (read → dispatch → write) │ │
│ └──────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────▼───────────────────────────────────┐ │
│ │ ipc/dispatcher.rs — method → port call │ │
│ └──┬───────────────────┬──────────────────┬────────────────────┘ │
│ │ │ │ │
│ ┌──▼────────┐ ┌───────▼──────┐ ┌───────▼──────┐ │
│ │ ports/ │ │ ports/ │ │ ports/ │ (+ more) │
│ │ Capture │ │ Input │ │ Window │ │
│ │ Port │ │ Port │ │ Port │ │
│ └──┬────────┘ └───────┬──────┘ └───────┬──────┘ │
│ │ │ │ │
│ ┌──▼────────────────────▼──────────────────▼──────────────────┐ │
│ │ adapters/ (selected by cfg(target_os) in app/mod.rs) │ │
│ │ │ │
│ │ Windows: WgcCapture, SendInputAdapter, EnumWindowsAdapter │ │
│ │ Linux: X11Capture, UinputAdapter, EwmhAdapter │ │
│ │ macOS: CGCapture, CGEventAdapter, NSWorkspaceAdapter │ │
│ │ Fallback: XcapCapture, EnigoInput │ │
│ └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│
native OS APIs
│
┌─────────────────────────────▼───────────────────────────────────────┐
│ Host OS (Windows / Linux / macOS) │
│ Running applications, game windows, etc. │
└─────────────────────────────────────────────────────────────────────┘
native/src/
domain/ # Pure types — Frame, Key, WindowInfo, ProcessHandle, DiffResult, …
ports/ # Async trait interfaces — CapturePort, InputPort, WindowPort, …
adapters/
xcap.rs # XcapCapture (cross-platform fallback)
enigo.rs # EnigoInput (cross-platform fallback)
windows/ # WgcCapture, SendInputAdapter, EnumWindowsAdapter
linux/ # X11Capture, UinputAdapter, EwmhAdapter
macos/ # CGCapture, CGEventAdapter, NSWorkspaceAdapter
process_adapter.rs # NativeProcessAdapter (all platforms)
analysis_adapter.rs # NativeAnalysisAdapter (all platforms)
ipc/
mod.rs # Re-exports Request, Response, read_request, write_response
mod_types.rs # Wire types
dispatcher.rs # Dispatcher — Arc<dyn Port> fields, routes methods
app/
mod.rs # DI wiring: selects adapters, builds Dispatcher
plugins/
mod.rs # MethodPlugin trait + PluginRegistry
main.rs # IPC loop — calls app::App::build(), runs forever
contracts/
openrpc.json # Full OpenRPC 1.2.6 spec (14 methods)
python/
playcua/ # Python package
tests/
test_computer.py # pytest suite with mock subprocess
native/tests/
unit/
analysis_tests.rs # NativeAnalysisAdapter unit tests
cd native
cargo build --release
# Binary: native/target/release/playcua-native[.exe]cd python
pip install -e .import asyncio
from playcua import Computer
async def main():
async with Computer("./native/target/release/playcua-native") as c:
# Take a screenshot
png_bytes = await c.screenshot()
with open("screen.png", "wb") as f:
f.write(png_bytes)
# Click, type, scroll
await c.left_click(960, 540)
await c.type_text("hello world")
await c.scroll(960, 540, direction="down", amount=5)
# Window management
windows = await c.list_windows()
print(windows)
# Wait for a visual change
changed_frame = await c.wait_for_visual_change(timeout=10.0)
asyncio.run(main())import asyncio
from playcua import Computer, ComputerAgent
async def main():
async with Computer() as computer:
agent = ComputerAgent(computer, model="claude-sonnet-4-5")
result = await agent.run("Open Notepad and type 'Hello from playcua'")
print(result)
asyncio.run(main())using PlayCua;
await using var computer = await NativeComputer.StartAsync("playcua-native");
byte[] png = await computer.ScreenshotAsync(windowTitle: "Diplomacy is Not an Option");
await computer.ClickAsync(960, 540);
await computer.TypeTextAsync("hello");
await computer.PressKeyAsync("return");
var windows = await computer.ListWindowsAsync();The native binary reads newline-delimited JSON requests from stdin and writes newline-delimited JSON responses to stdout. Logs go to stderr.
The full API spec is in contracts/openrpc.json.
{"jsonrpc":"2.0","id":1,"method":"screenshot","params":{"monitor":0}}{"jsonrpc":"2.0","id":1,"result":{"data":"<base64>","width":1920,"height":1080,"format":"png"}}{"jsonrpc":"2.0","id":1,"error":{"code":-32601,"message":"Method not found: foo"}}| Method | Params | Result |
|---|---|---|
ping |
{} |
{"ok":true,"version":"0.1.0"} |
screenshot |
{"window_title?":str, "monitor?":int} |
{"data":b64,"width":int,"height":int,...} |
input.key |
{"key":str, "action":"press"|"down"|"up"} |
{"ok":true} |
input.type |
{"text":str} |
{"ok":true} |
input.click |
{"x":int,"y":int,"button":"left"|"right"|"middle","action":"click"|"down"|"up"} |
{"ok":true} |
input.scroll |
{"x":int,"y":int,"direction":"up"|"down"|"left"|"right","amount?":int} |
{"ok":true} |
input.move |
{"x":int,"y":int} |
{"ok":true} |
windows.list |
{} |
[{hwnd,title,pid,x,y,width,height,visible}] |
windows.focus |
{"hwnd":int} |
{"ok":true} |
windows.find |
{"title?":str,"pid?":int} |
{...WindowInfo...} or null |
process.launch |
{"path":str,"args?":[],"cwd?":str} |
{"pid":int} |
process.kill |
{"pid":int} |
{"ok":true} |
process.status |
{"pid":int} |
{"running":bool,"exit_code?":int} |
analysis.diff |
{"image_a":b64,"image_b":b64,"threshold?":float} |
{"changed":bool,"change_ratio":float} |
analysis.hash |
{"image":b64} |
{"hash":str} |
| Feature | Windows | Linux | macOS |
|---|---|---|---|
| Monitor screenshot | xcap (fallback) | xcap/x11rb | xcap/CG |
| Window screenshot | WGC (GPU) + xcap | xcap/x11rb | xcap/CG |
| Keyboard input | enigo (SendInput) | enigo (uinput/X11) | enigo (CGEvent) |
| Mouse input | enigo | enigo | enigo |
| PostMessage inject | Win32 PostMessage | N/A | N/A |
| Window list | Win32 EnumWindows | xcap/EWMH | xcap/NSWorkspace |
| Window focus | SetForegroundWindow | TODO (x11rb) | TODO (NSWorkspace) |
| Process launch/kill | Win32 Terminate | SIGTERM | SIGTERM |
| Structured logs | JSON to stderr | JSON to stderr | JSON to stderr |
| Variable | Default | Description |
|---|---|---|
PLAYCUA_LOG |
info |
Tracing level: error, warn, info, debug, trace |
Register custom JSON-RPC methods without touching the core:
use playcua_native::plugins::{MethodPlugin, PluginRegistry};
use async_trait::async_trait;
use serde_json::Value;
struct MyPlugin;
#[async_trait]
impl MethodPlugin for MyPlugin {
fn method_name(&self) -> &'static str { "custom.hello" }
async fn handle(&self, _params: Value) -> anyhow::Result<Value> {
Ok(serde_json::json!({ "greeting": "hello from plugin" }))
}
}
let mut registry = PluginRegistry::new();
registry.register(Box::new(MyPlugin));| Feature | upstream CUA | playcua |
|---|---|---|
| Computer server | Python + macOS VM | Rust binary (cross-platform) |
| Transport | Unix socket / HTTP | stdio NDJSON (no server needed) |
| VM dependency | Required | None |
| Docker | Required | None |
| Screenshot (games) | macOS only | Windows WGC (GPU-accelerated) |
| Input injection | macOS only | enigo + Win32 PostMessage |
| Architecture | Monolithic | Hexagonal (ports & adapters) |
| Languages | Python | Rust + Python + C# bindings |
| Startup overhead | ~5s (VM boot) | <100ms (process spawn) |
| Observability | Basic logs | JSON structured tracing |
MIT — see LICENSE.