Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
258 changes: 184 additions & 74 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,138 +1,248 @@
# 🤖 MacPilot

Programmatic macOS control for AI agents.
Programmatic macOS control for AI agents. Everything a human can do via keyboard + mouse, MacPilot can do programmatically.

Current version: **0.4.0**
Current version: **0.4.1**

## Build
## Quick Start

```bash
# Clone and build
git clone https://github.com/adhikjoshi/macpilot.git
cd macpilot
swift build -c release

# Build .app bundle (recommended — needed for Screen Recording permission)
bash scripts/build-app.sh

# Install to a location
cp -R MacPilot.app /path/to/your/tools/
```

Binary output:
### Permissions Required
1. **Accessibility** — System Settings → Privacy & Security → Accessibility → Add MacPilot.app
2. **Screen Recording** — System Settings → Privacy & Security → Screen Recording → Add MacPilot.app

### Verify Permissions
```bash
.build/release/macpilot
MacPilot ax-check --json
# Should show: "trusted": true
```

## Build `.app` bundle (recommended for Accessibility permissions)
## ⚠️ Important: Screenshot Invocation

When calling MacPilot from a background process (e.g., AI agent, cron, daemon), **screenshots require the .app identity** for Screen Recording permission:

```bash
bash scripts/build-app.sh
# ✅ For screenshots (uses .app's Screen Recording permission)
open -n -W -a /path/to/MacPilot.app --args screenshot --output /tmp/screen.png --json

# ✅ For all other commands (direct binary is fine)
MP=/path/to/MacPilot.app/Contents/MacOS/MacPilot
$MP window list --json
$MP keyboard type "hello" --json
$MP mouse click 100 200 --json
```

This builds release binary, assembles `MacPilot.app`, and ad-hoc signs it.

### App Bundle Structure
## Commands

- `MacPilot.app/Contents/MacOS/MacPilot`
- `MacPilot.app/Contents/Info.plist`
- `MacPilot.app/Contents/Resources/`
All commands support `--json` for structured output.

Signing command used by script:
### Mouse
```bash
MacPilot mouse move 300 300 --json
MacPilot mouse click 100 200 --json
MacPilot mouse click 100 200 --right --json # right click
MacPilot mouse doubleclick 100 200 --json
MacPilot mouse drag 100 200 300 400 --json
MacPilot mouse scroll up 5 --json
MacPilot mouse scroll down 10 --json
```

### Keyboard
```bash
codesign --force --deep --sign - MacPilot.app
MacPilot keyboard type "Hello World" --json
MacPilot keyboard key return --json
MacPilot keyboard key escape --json
MacPilot keyboard key tab --json
MacPilot keyboard key space --json
MacPilot keyboard key delete --json
MacPilot keyboard key "cmd+c" --json # keyboard shortcuts
MacPilot keyboard key "cmd+shift+3" --json # screenshot shortcut
MacPilot keyboard key "ctrl+right" --json # switch Space
```

## Commands
### Screenshot
```bash
# Full screen
MacPilot screenshot --output /tmp/screen.png --json

### Mouse
# Region capture (x,y,width,height)
MacPilot screenshot --region 100,100,800,600 --output /tmp/region.png --json

# Note: From background processes, use:
open -n -W -a MacPilot.app --args screenshot --output /tmp/screen.png --json
```

### App Management
```bash
macpilot click 100 200
macpilot doubleclick 100 200
macpilot rightclick 100 200
macpilot move 100 200
macpilot drag 100 200 300 400
macpilot scroll up 5
macpilot scroll down 10
MacPilot app open Safari --json # open by name
MacPilot app open com.apple.TextEdit --json # open by bundle ID
MacPilot app list --json # list running apps
MacPilot app quit Safari --json # graceful quit
MacPilot app quit Safari --force --json # force quit
# Note: System processes (Finder, Dock, etc.) are protected and cannot be quit
```

### Keyboard
### Window Management
Both positional and flag syntax supported:
```bash
MacPilot window list --json # list all windows
MacPilot window focus Safari --json # positional
MacPilot window focus --app Safari --json # flag syntax
MacPilot window move Safari 100 100 --json # positional: app x y
MacPilot window move --app Safari --x 100 --y 100 --json
MacPilot window resize Safari 1200 800 --json # positional: app w h
MacPilot window resize --app Safari --width 1200 --height 800 --json
MacPilot window minimize Safari --json
MacPilot window fullscreen Safari --json # toggle fullscreen
MacPilot window close Safari --json
```

### UI / Accessibility
```bash
macpilot type "Hello"
macpilot key enter
macpilot key cmd+c
MacPilot ui list --json # list UI elements of frontmost app
MacPilot ui find "Submit" --json # find element by name
MacPilot ui click "Submit" --json # click element by accessibility label
MacPilot ui tree --depth 3 --json # element hierarchy (alias: --max-depth)
```

### Screenshot
### Shell
```bash
MacPilot shell run "ls -la" --json
MacPilot shell run "whoami" --json
MacPilot shell run "sw_vers" --json
```

### Chain Commands (multi-step automation)
```bash
macpilot screenshot
macpilot screenshot --output /tmp/screen.png
macpilot screenshot --region 0,0,500,500
macpilot screenshot --window "Google Chrome"
# Navigate in browser: address bar → type URL → press enter
MacPilot chain "cmd+l" "type:https://google.com" "return" --json

# With delays between steps
MacPilot chain "cmd+l" "type:https://google.com" "sleep:500" "return" --delay 200 --json

# Chain syntax:
# "key_name" → press key (return, escape, tab, etc.)
# "cmd+key" → keyboard shortcut
# "type:text" → type text
# "sleep:ms" → pause for milliseconds
```

### App
### Chrome
```bash
MacPilot chrome open-url "https://example.com" --json
MacPilot chrome new-tab "https://example.com" --json
MacPilot chrome extensions --json
MacPilot chrome dev-mode --json
```

### Space (Desktop) Management
Both positional and flag syntax:
```bash
macpilot app open "Google Chrome"
macpilot app open com.apple.TextEdit
macpilot app focus Chrome
macpilot app list
macpilot app quit TextEdit
macpilot app quit TextEdit --force
MacPilot space list --json
MacPilot space switch right --json # positional
MacPilot space switch --direction right --json # flag syntax
MacPilot space switch left --json
MacPilot space switch 1 --json # by index
MacPilot space switch --index 1 --json
```

### Window
### Dialog / File Picker
```bash
MacPilot dialog navigate /tmp --json
MacPilot dialog select myfile.txt --json
```

### Clipboard
```bash
macpilot window list
macpilot window list --all-spaces
macpilot window focus --app "Terminal"
macpilot window focus --app "Google Chrome" --title "Docs"
macpilot window resize --app "Terminal" --width 1200 --height 800
macpilot window move --app "Terminal" --x 100 --y 100
macpilot window minimize --app "Terminal"
macpilot window close --app "Terminal"
macpilot window fullscreen --app "Terminal"
MacPilot clipboard get --json
MacPilot clipboard set "hello" --json
```

### UI
### Wait
```bash
MacPilot wait element "Submit" --timeout 10 --json
MacPilot wait window "Chrome" --timeout 10 --json
MacPilot wait seconds 1.5 --json
```

### System
```bash
macpilot ui list
macpilot ui find "Submit"
macpilot ui click "Submit"
macpilot ui tree
MacPilot ax-check --json # verify Accessibility permission
MacPilot --version # show version
```

### Other
## Best Practices for AI Agents

### Validate → Act → Verify (MANDATORY)

Never fire commands blindly. Always:

1. **PRE-CHECK** state before acting
2. **ACT** — run your command
3. **POST-CHECK** — verify it worked

```bash
macpilot clipboard get
macpilot clipboard set "hello"
macpilot dialog navigate /path/to/file
macpilot shell run "echo hi"
macpilot wait element "Submit" --timeout 10
macpilot wait window "Chrome" --timeout 10
macpilot wait seconds 1.5
macpilot ax-check --json
macpilot chain "sleep:10" "key:enter"
macpilot chrome tabs --json
macpilot run app list --json
# BAD (fire and pray)
$MP app open Safari
$MP chain "cmd+l" "type:url" "return"
$MP screenshot

# GOOD (validate → act → verify)
$MP app list --json # PRE: is Safari running?
$MP app open Safari --json; sleep 3 # ACT: open it
$MP window list --json # POST: is window visible?
$MP window focus Safari --json # PRE: ensure focus
$MP chain "cmd+l" "type:url" "return" # ACT: navigate
sleep 3
open -n -W -a MacPilot.app --args screenshot --output /tmp/result.png # POST: verify
```

## JSON output
### Key Rules
- **One app at a time** — finish with one before starting another
- **Always confirm focus** before typing/clicking
- **Sleep 2-3s** after opening apps (they need time to load)
- **Clean up** — close apps/tabs you opened
- **Check window list** before window operations

Most commands support `--json`.
## Build

```bash
macpilot app list --json
macpilot window list --json
swift build # debug build
swift build -c release # release build
bash scripts/build-app.sh # build .app bundle with ad-hoc signing
bash Tests/run_tests.sh # run integration tests
```

## CI / CD

- **CI**: Runs on every PR and merge to main (build + test + .app verification)
- **Release**: Push a tag (`git tag v0.5.0 && git push --tags`) to auto-create a GitHub Release with .app zip + standalone binary

## Safety

MacPilot has built-in safety limits:
- **Protected processes**: Finder, Dock, WindowServer, SystemUIServer, launchd, kernel_task cannot be quit
- **Protected paths**: System directories cannot be modified via shell
- **Shell safety**: Dangerous commands are blocked

## Requirements

- macOS 13+
- Accessibility permission
- Screen Recording permission (screenshots)
- macOS 13+ (Ventura or later)
- Swift 5.9+
- Accessibility permission (required for all commands)
- Screen Recording permission (required for screenshots)

## License

Expand Down