Skip to content

Browser Automation

Daniel Ellison edited this page Feb 17, 2026 · 1 revision

Browser Automation

Kai can interact with web pages using Playwright CLI, a command-line browser automation tool from Microsoft. This gives Kai the ability to take screenshots, scrape page content, fill forms, navigate multi-step workflows, and run JavaScript on any web page - all from the shell.

Everything runs locally and headless. No browser windows appear on screen unless explicitly requested.


Why Playwright CLI (not MCP)

Playwright also ships as an MCP server, but Kai uses the CLI version instead. The reasons:

Concern MCP Server CLI
Permission model bypassPermissions auto-approves all tool calls with no human-in-the-loop Regular shell commands, same approval flow as everything else
Trust boundary New persistent server process exposing browser tools via protocol No daemon, no open port, no protocol server
Visibility Tool calls may not be visible in conversation Every command is visible in the conversation
Infrastructure Requires MCP client configuration and trust setup Just a CLI tool on PATH
Session model Always-available automation surface Only runs when explicitly called

The CLI avoids every security concern that led to MCP being rejected. It uses the same shell access Claude Code already has, with full visibility and the ability to /stop at any time.


How It Works

Playwright CLI is installed as a Claude Code skill at .claude/skills/playwright-cli/. The skill file tells Claude Code what commands are available and grants permission to run them.

A typical interaction:

  1. Kai runs playwright-cli open <url> to launch a headless browser
  2. The browser navigates to the page and returns a compact snapshot with element references (e1, e5, e21)
  3. Kai can interact with elements by reference: click e21, fill e15 "text", screenshot
  4. When done, playwright-cli close shuts down the browser

Sessions are ephemeral by default - each open starts with a blank profile (no cookies, no saved credentials, no login state). This is a deliberate security choice: your system Chrome's logged-in accounts are never exposed.


Capabilities

Category Commands Examples
Navigation open, goto, go-back, go-forward, reload, close open https://example.com, goto https://other.com
Interaction click, fill, type, select, check, upload, hover, drag click e21, fill e15 "hello"
Capture screenshot, pdf, snapshot screenshot (full page), screenshot e5 (element)
JavaScript eval eval document.title, eval el => el.textContent e5
Tabs tab-list, tab-new, tab-close, tab-select tab-new https://example.com
Cookies cookie-list, cookie-get, cookie-set, cookie-delete, cookie-clear cookie-list
Storage state-save, state-load, localStorage/sessionStorage commands state-save auth.json
Network route, unroute, network route **/api/* {"status": 200}
DevTools console, tracing-start, tracing-stop, video-start, video-stop console error

Full command reference: playwright-cli --help


Pairing with Perplexity

Playwright CLI complements the Perplexity service for web access:

Need Tool Why
Quick factual answers, current events, research Perplexity (via service proxy) Faster, cheaper, synthesized answers with citations
Read a specific web page, check what it looks like Playwright CLI Actually loads the page, sees the real content and layout
Fill a form, click buttons, multi-step workflows Playwright CLI Perplexity can't interact with pages
Take a screenshot or PDF of a page Playwright CLI Visual capture requires a real browser
Compare search results across sources Perplexity first, Playwright to verify Best of both

Together they give Kai broad web access without requiring any cloud-hosted browser infrastructure.


Installation

Prerequisites

  • Node.js and npm
  • Google Chrome (or Playwright will install its own Chromium)

Install

npm install -g @playwright/cli@latest

Install skill (from workspace directory)

cd <workspace-path>
playwright-cli install --skills

This creates .claude/skills/playwright-cli/ with the skill file and reference documentation. If Chrome is detected on the system, it's used as the default browser automatically.

Verify

playwright-cli --version
playwright-cli open https://example.com
playwright-cli snapshot
playwright-cli close

Security Model

  • Ephemeral sessions - Each open starts with a blank browser profile. No cookies, passwords, or login state from your system Chrome.
  • Headless by default - No visible browser windows. Pass --headed to watch.
  • No persistent state - Unless explicitly requested with --persistent or state-save, nothing survives between sessions.
  • Same trust model as shell - Every command runs through Bash, visible in the conversation, subject to the same approval flow.
  • No daemon - Nothing runs when Playwright CLI isn't being used. No open ports, no background process.
  • File access - File uploads are restricted to configured paths by default.

Headed mode

Pass --headed to open to show the browser window on the desktop. Useful for debugging but not needed for normal operation.


Technical Details

  • Binary location: Global npm bin directory (e.g., /Users/kai/.npm-global/bin/playwright-cli)
  • Skill location: .claude/skills/playwright-cli/ (per-workspace, gitignored)
  • Browser: Uses system Chrome when detected, otherwise downloads its own Chromium
  • Snapshot format: YAML files in .playwright-cli/ directory (gitignored)
  • Token efficiency: ~27,000 tokens per typical task vs ~114,000 for the MCP equivalent (Microsoft's benchmarks). Savings come from file-based snapshots instead of inline accessibility trees.

Troubleshooting

"playwright-cli: command not found" - The binary is in your npm global bin directory. Either add it to PATH or use the full path (check with npm prefix -g).

Browser fails to launch - Run playwright-cli install-browser to install a bundled Chromium, or ensure Chrome is installed at a standard location.

Session already open - Run playwright-cli close or playwright-cli close-all to clean up stale sessions.

Snapshots not appearing - Check that the .playwright-cli/ directory exists in your working directory. Snapshots are saved relative to where the command runs.

Clone this wiki locally