Browser Automation

Kai can interact with web pages using Playwright CLI, a command-line browser automation tool from Microsoft. This gives Kai the ability to take screenshots, scrape page content, fill forms, navigate multi-step workflows, and run JavaScript on any web page - all from the shell.

Everything runs locally and headless. No browser windows appear on screen unless explicitly requested.

Why Playwright CLI (not MCP)

Playwright also ships as an MCP server, but Kai uses the CLI version instead. The reasons:

Concern	MCP Server	CLI
Permission model	`bypassPermissions` auto-approves all tool calls with no human-in-the-loop	Regular shell commands, same approval flow as everything else
Trust boundary	New persistent server process exposing browser tools via protocol	No daemon, no open port, no protocol server
Visibility	Tool calls may not be visible in conversation	Every command is visible in the conversation
Infrastructure	Requires MCP client configuration and trust setup	Just a CLI tool on PATH
Session model	Always-available automation surface	Only runs when explicitly called

The CLI avoids every security concern that led to MCP being rejected. It uses the same shell access Claude Code already has, with full visibility and the ability to /stop at any time.

How It Works

Playwright CLI is installed as a Claude Code skill at .claude/skills/playwright-cli/. The skill file tells Claude Code what commands are available and grants permission to run them.

A typical interaction:

Kai runs playwright-cli open <url> to launch a headless browser
The browser navigates to the page and returns a compact snapshot with element references (e1, e5, e21)
Kai can interact with elements by reference: click e21, fill e15 "text", screenshot
When done, playwright-cli close shuts down the browser

Sessions are ephemeral by default - each open starts with a blank profile (no cookies, no saved credentials, no login state). This is a deliberate security choice: your system Chrome's logged-in accounts are never exposed.

Capabilities

Category	Commands	Examples
Navigation	`open`, `goto`, `go-back`, `go-forward`, `reload`, `close`	`open https://example.com`, `goto https://other.com`
Interaction	`click`, `fill`, `type`, `select`, `check`, `upload`, `hover`, `drag`	`click e21`, `fill e15 "hello"`
Capture	`screenshot`, `pdf`, `snapshot`	`screenshot` (full page), `screenshot e5` (element)
JavaScript	`eval`	`eval document.title`, `eval el => el.textContent e5`
Tabs	`tab-list`, `tab-new`, `tab-close`, `tab-select`	`tab-new https://example.com`
Cookies	`cookie-list`, `cookie-get`, `cookie-set`, `cookie-delete`, `cookie-clear`	`cookie-list`
Storage	`state-save`, `state-load`, localStorage/sessionStorage commands	`state-save auth.json`
Network	`route`, `unroute`, `network`	`route */api/ {"status": 200}`
DevTools	`console`, `tracing-start`, `tracing-stop`, `video-start`, `video-stop`	`console error`

Full command reference: playwright-cli --help

Pairing with Perplexity

Playwright CLI complements the Perplexity service for web access:

Need	Tool	Why
Quick factual answers, current events, research	Perplexity (via service proxy)	Faster, cheaper, synthesized answers with citations
Read a specific web page, check what it looks like	Playwright CLI	Actually loads the page, sees the real content and layout
Fill a form, click buttons, multi-step workflows	Playwright CLI	Perplexity can't interact with pages
Take a screenshot or PDF of a page	Playwright CLI	Visual capture requires a real browser
Compare search results across sources	Perplexity first, Playwright to verify	Best of both

Together they give Kai broad web access without requiring any cloud-hosted browser infrastructure.

Installation

Prerequisites

Node.js and npm
Google Chrome (or Playwright will install its own Chromium)

Install

npm install -g @playwright/cli@latest

Install skill (from workspace directory)

cd <workspace-path>
playwright-cli install --skills

This creates .claude/skills/playwright-cli/ with the skill file and reference documentation. If Chrome is detected on the system, it's used as the default browser automatically.

Verify

playwright-cli --version
playwright-cli open https://example.com
playwright-cli snapshot
playwright-cli close

Security Model

Ephemeral sessions - Each open starts with a blank browser profile. No cookies, passwords, or login state from your system Chrome.
Headless by default - No visible browser windows. Pass --headed to watch.
No persistent state - Unless explicitly requested with --persistent or state-save, nothing survives between sessions.
Same trust model as shell - Every command runs through Bash, visible in the conversation, subject to the same approval flow.
No daemon - Nothing runs when Playwright CLI isn't being used. No open ports, no background process.
File access - File uploads are restricted to configured paths by default.

Headed mode

Pass --headed to open to show the browser window on the desktop. Useful for debugging but not needed for normal operation.

Technical Details

Binary location: Global npm bin directory (e.g., /Users/kai/.npm-global/bin/playwright-cli)
Skill location: .claude/skills/playwright-cli/ (per-workspace, gitignored)
Browser: Uses system Chrome when detected, otherwise downloads its own Chromium
Snapshot format: YAML files in .playwright-cli/ directory (gitignored)
Token efficiency: ~27,000 tokens per typical task vs ~114,000 for the MCP equivalent (Microsoft's benchmarks). Savings come from file-based snapshots instead of inline accessibility trees.

Troubleshooting

"playwright-cli: command not found" - The binary is in your npm global bin directory. Either add it to PATH or use the full path (check with npm prefix -g).

Browser fails to launch - Run playwright-cli install-browser to install a bundled Chromium, or ensure Chrome is installed at a standard location.

Session already open - Run playwright-cli close or playwright-cli close-all to clean up stale sessions.

Snapshots not appearing - Check that the .playwright-cli/ directory exists in your working directory. Snapshots are saved relative to where the command runs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Browser Automation

Browser Automation

Why Playwright CLI (not MCP)

How It Works

Capabilities

Pairing with Perplexity

Installation

Prerequisites

Install

Install skill (from workspace directory)

Verify

Security Model

Headed mode

Technical Details

Troubleshooting

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally