-
Notifications
You must be signed in to change notification settings - Fork 8
Browser Automation
Kai can interact with web pages using Playwright CLI, a command-line browser automation tool from Microsoft. This gives Kai the ability to take screenshots, scrape page content, fill forms, navigate multi-step workflows, and run JavaScript on any web page - all from the shell.
Everything runs locally and headless. No browser windows appear on screen unless explicitly requested.
Playwright also ships as an MCP server, but Kai uses the CLI version instead. The reasons:
| Concern | MCP Server | CLI |
|---|---|---|
| Permission model |
bypassPermissions auto-approves all tool calls with no human-in-the-loop |
Regular shell commands, same approval flow as everything else |
| Trust boundary | New persistent server process exposing browser tools via protocol | No daemon, no open port, no protocol server |
| Visibility | Tool calls may not be visible in conversation | Every command is visible in the conversation |
| Infrastructure | Requires MCP client configuration and trust setup | Just a CLI tool on PATH |
| Session model | Always-available automation surface | Only runs when explicitly called |
The CLI avoids every security concern that led to MCP being rejected. It uses the same shell access Claude Code already has, with full visibility and the ability to /stop at any time.
Playwright CLI is installed as a Claude Code skill at .claude/skills/playwright-cli/. The skill file tells Claude Code what commands are available and grants permission to run them.
A typical interaction:
- Kai runs
playwright-cli open <url>to launch a headless browser - The browser navigates to the page and returns a compact snapshot with element references (
e1,e5,e21) - Kai can interact with elements by reference:
click e21,fill e15 "text",screenshot - When done,
playwright-cli closeshuts down the browser
Sessions are ephemeral by default - each open starts with a blank profile (no cookies, no saved credentials, no login state). This is a deliberate security choice: your system Chrome's logged-in accounts are never exposed.
| Category | Commands | Examples |
|---|---|---|
| Navigation |
open, goto, go-back, go-forward, reload, close
|
open https://example.com, goto https://other.com
|
| Interaction |
click, fill, type, select, check, upload, hover, drag
|
click e21, fill e15 "hello"
|
| Capture |
screenshot, pdf, snapshot
|
screenshot (full page), screenshot e5 (element) |
| JavaScript | eval |
eval document.title, eval el => el.textContent e5
|
| Tabs |
tab-list, tab-new, tab-close, tab-select
|
tab-new https://example.com |
| Cookies |
cookie-list, cookie-get, cookie-set, cookie-delete, cookie-clear
|
cookie-list |
| Storage |
state-save, state-load, localStorage/sessionStorage commands |
state-save auth.json |
| Network |
route, unroute, network
|
route **/api/* {"status": 200} |
| DevTools |
console, tracing-start, tracing-stop, video-start, video-stop
|
console error |
Full command reference: playwright-cli --help
Playwright CLI complements the Perplexity service for web access:
| Need | Tool | Why |
|---|---|---|
| Quick factual answers, current events, research | Perplexity (via service proxy) | Faster, cheaper, synthesized answers with citations |
| Read a specific web page, check what it looks like | Playwright CLI | Actually loads the page, sees the real content and layout |
| Fill a form, click buttons, multi-step workflows | Playwright CLI | Perplexity can't interact with pages |
| Take a screenshot or PDF of a page | Playwright CLI | Visual capture requires a real browser |
| Compare search results across sources | Perplexity first, Playwright to verify | Best of both |
Together they give Kai broad web access without requiring any cloud-hosted browser infrastructure.
- Node.js and npm
- Google Chrome (or Playwright will install its own Chromium)
npm install -g @playwright/cli@latestcd <workspace-path>
playwright-cli install --skillsThis creates .claude/skills/playwright-cli/ with the skill file and reference documentation. If Chrome is detected on the system, it's used as the default browser automatically.
playwright-cli --version
playwright-cli open https://example.com
playwright-cli snapshot
playwright-cli close-
Ephemeral sessions - Each
openstarts with a blank browser profile. No cookies, passwords, or login state from your system Chrome. -
Headless by default - No visible browser windows. Pass
--headedto watch. -
No persistent state - Unless explicitly requested with
--persistentorstate-save, nothing survives between sessions. - Same trust model as shell - Every command runs through Bash, visible in the conversation, subject to the same approval flow.
- No daemon - Nothing runs when Playwright CLI isn't being used. No open ports, no background process.
- File access - File uploads are restricted to configured paths by default.
Pass --headed to open to show the browser window on the desktop. Useful for debugging but not needed for normal operation.
-
Binary location: Global npm bin directory (e.g.,
/Users/kai/.npm-global/bin/playwright-cli) -
Skill location:
.claude/skills/playwright-cli/(per-workspace, gitignored) - Browser: Uses system Chrome when detected, otherwise downloads its own Chromium
-
Snapshot format: YAML files in
.playwright-cli/directory (gitignored) - Token efficiency: ~27,000 tokens per typical task vs ~114,000 for the MCP equivalent (Microsoft's benchmarks). Savings come from file-based snapshots instead of inline accessibility trees.
"playwright-cli: command not found" - The binary is in your npm global bin directory. Either add it to PATH or use the full path (check with npm prefix -g).
Browser fails to launch - Run playwright-cli install-browser to install a bundled Chromium, or ensure Chrome is installed at a standard location.
Session already open - Run playwright-cli close or playwright-cli close-all to clean up stale sessions.
Snapshots not appearing - Check that the .playwright-cli/ directory exists in your working directory. Snapshots are saved relative to where the command runs.