One Go binary · Cross-platform · Purpose-built engine over chromedp — no Playwright, no Puppeteer, no Node
Built for the agent that uses it, not a human. The agent says what it wants (
act "Sign in"); the tool resolves it, does it, and reports a verdict. Snapshots are dense ref-lines, not aria dumps. Every action returns a delta (what changed) plus a one-line semantic outcome. Structured data comes back as JSON, not 200 refs to reconstruct. The action log is offloaded from the agent's context.
The big browser MCP servers tax the agent every step. agent-browser adds a cognition layer on top of a token-efficient engine, so the agent spends tokens on the task, not on interpreting the page.
Measured head-to-head against the two largest browser-automation MCP servers:
| agent-browser | Playwright MCP | Chrome DevTools MCP | |
|---|---|---|---|
| Snapshot of Hacker News | ~1,200 tok | ~14,700 tok | ~9,800 tok |
| Snapshot of a GitHub repo | ~1,250 tok | ~21,600 tok | ~20,800 tok |
| Cost to connect (tool defs + instructions) | ~1,900 tok (8 tools) | ~3,442 tok (22) | ~5,000 tok (26+) |
| Saucedemo login (real task, all succeed) | ~154 tok | ~1,714 tok | ~1,483 tok |
Within Playwright MCP's ballpark on connect cost (and now lighter — 8 tools, ~1,900 tok to connect), and for that you also get four things neither has: intent-first act, action verdicts, a JS helper API (js) for one-call structured data, and history. On a real task the gap is ~10x: the login above is nav + three act calls (name the field, name the button) instead of find, fill, fill, find, click, re-see, re-see.
A second, success-normalized benchmark (bench/successtoken, 5 multi-step tasks vs @playwright/mcp): both 5/5 success, ~1,142 tok vs ~2,337 tok — ~2x fewer at equal success. Reproduce with go run ./bench/successtoken -compare.
Connect cost estimated as chars/4.41; Playwright MCP from a real Claude Code per-tool breakdown (jdhodges.com); Chrome DevTools MCP commonly reported (varies ~5k–17k by config, low end used). Snapshot + login measured on the live page, headless, 2026-06. Numbers approximate; the per-task row is the decisive comparison.
act— one tool for any single action. Name a control (act "Sign in",act "Username" value=x) OR give a ref/selector; local heuristics resolve it (no LLM, no per-call cost) and do the right thing for its role — click buttons/links, fill inputs (passvalue=), select dropdowns (passvalue=). Addhover=trueto hover,key=Enterto press a key (Enter submits, Escape closes),files=[..]to upload. OptionalwaitUrl=/waitText=/waitGone=fuses a wait into the action. Collapses find + click/fill/select + see into one call. Ambiguous matches return ranked candidates — it never guesses; disambiguate withnthorrole, or use a ref.js— the structured-data hero. Run JS with a helper API in scope and get clean JSON back:return {stars: text('#stars'), lang: attr('.lang','aria-label'), items: $$('li').map(text)}. Helpers:$,$$(→array),text,attr,html,visible,data,table(a<table>→ rows, or objects if the first row is<th>),links(→[{text,href}]),rect,xpath,frame(title)(a same-origin iframe's document),wait(fn,ms).await="sel"waits for a selector first. One call, no re-snapshot, no refs to parse — the go-to for any scattered/scraped data. A thrown error is surfaced with the page-side message. Replaces v2'seval+extract+collect.- Verdicts on every action.
navigated to …/dialog opened: …/status: added to cart/changed: +1 -1 ~1/page updated/no visible effect/CHALLENGE: …. For non-navigation actions it also folds in the XHR/Fetch responses that fired (net: /api/cart 200) — the "did my click hit the API" loop, closed without a re-see. navreturns an orientation. Navigate and land oriented: page type, auth state, the top primary actions WITH refs, regions, counts — so you can act immediately, no separatesee.back/forward/reload;newTab=trueopens a new tab.see level=outline— discovery, not guessing. The page's semantic skeleton (headings/tables/lists/forms/regions) each with a WORKING CSS selector — use it to pick selectors forjsinstead of ping-ponging see/extract/read until one hits. Plusbrief/refs/text/full/shotlevels.findbridges refs and selectors. By role/text → refs (pass toact ref=); byselector=→[css]matches (pass tojsoract selector=);selectors=truealso gives a CSS selector per a11y match.history— session memory offloaded from context. A rolling log of step / action / verdict / URL. Query it (last=N,errors=true) to re-orient after a long flow instead of carrying the transcript in your context window.- Recovery built in.
session mode=resetrelaunches the browser (a wedged tab/crashed browser/stale state);mode=clearwipes cookies + storage and reloads (a one-call clean slate). Every op is bounded by an op-timeout so a hung page errors instead of wedging.
Requires Go 1.26+ and Chrome/Chromium (auto-discovered).
go install github.com/dondai1234/agent-browser/v3/cmd/agent-browser@latest
agent-browser --version # verify; re-run the install command to updateAdd it to any MCP client:
{
"mcpServers": {
"agent-browser": { "command": "agent-browser", "args": ["mcp"] }
}
}Cursor, Claude Code, Claude Desktop, Windsurf, VS Code Copilot, opencode, Hermes Agent, and OpenClaw all work with this shape (VS Code uses "servers" instead of "mcpServers"). Ready-to-paste configs and per-client file paths are in examples/. Claude Code one-liner: claude mcp add agent-browser -- agent-browser mcp.
spawn agent-browser ENOENT? The client can't find the binary on its PATH — use the absolute path incommand:$(go env GOPATH)/bin/agent-browser(append.exeon Windows).
Or: paste this prompt and let your agent install it
Install the agent-browser MCP server and connect it to this client:
1. Run: go install github.com/dondai1234/agent-browser/v3/cmd/agent-browser@latest
2. Verify: agent-browser --version (expect an agent-browser v3.x version)
3. Find out which agent harness you're running on (Opencode, OpenClaw, Hermes Agent, etc.) and locate its MCP config.
4. Add a stdio MCP server named "agent-browser": command "agent-browser", args ["mcp"].
5. Confirm it connects, then tell me it's ready.
nav https://saucedemo.com → page: login form | auth: anonymous | actions: r3 button "Login" | r4 textbox "Username"
act "Username" value="standard_user" → act "Username" -> [r4] textbox (fill) | verdict: changed
act "Password" value="secret_sauce" → act "Password" -> [r5] textbox (fill) | verdict: changed
act "Login" waitUrl="/inventory.html" → act "Login" -> [r3] button (click) | verdict: navigated to /inventory.html
js "return {price: text('.inventory_item_price').slice(1), name: text('.inventory_item_name')}"
→ {"price":"29.99","name":"Sauce Labs Backpack"}
see level=outline → h2 ".title" "Products" · div ".inventory_list" (6 items)
Name the control, get a verdict. You rarely call see after an action — the verdict + delta tell you what happened. For data, one js call with the helper API replaces a find→see→extract→read dance. By-ref mode (find then act ref=r12) still works when you need precision.
Move & look — nav (open/back/forward/reload/newTab → orientation) · see (brief / refs / text / outline / full / shot)
Act & scrape — act (click/fill/select/hover/press/upload by intent/ref/selector + optional wait → verdict+delta) · js (run JS with a helper API → clean JSON) · find (by role/text → refs; by selector → matches; selectors=true for both)
Session — tabs (list/switch/close/label) · history (action log) · session (reset / clear)
Every tool's description is hand-crafted to tell the agent exactly what to pass, what it returns, and the gotcha — masterable from the defs alone. js covers anything the typed tools don't.
- Static tells patched:
navigator.webdriver=false(via--disable-blink-features=AutomationControlled,--enable-automationdropped);userAgentData/plugins/languages/window.chrome/WebGL/hardware spoofed via a pre-page init script. Verified:webdriver=false,plugins=5,languages=en-US,en. - Real fingerprint:
--headless=new(near-real) by default;--headless=falsefor the real GPU/canvas/timing fingerprint on hard targets. - Behavioral realism: a jittered smoothstep mouse path before each click;
act key=for typed input. - Proxies + challenge detection:
--proxy-serverfor residential proxies (the biggest IP-reputation lever);navigate/seesurfaceCHALLENGE:on Cloudflare/DataDome/reCAPTCHA/hCaptcha/Turnstile and auto-wait for managed challenges to clear. A click that lands on a challenge reportsverdict: CHALLENGE: ….
Honest limits — no chromedp tool beats these
- The CDP runtime signal (a debugger-attached timing delta) is fundamental to CDP; only a custom Chromium build (e.g. Camoufox) hides it.
- Image-CAPTCHA solving (reCAPTCHA grids, hCaptcha) needs a paid solver — solver integration (user-provided API key) is planned.
- The intent resolver + verdict heuristics are best-effort over the a11y tree, not ground truth.
actfalls back to candidates when ambiguous (never guesses);js+see level=outlinegive the raw structure;see level=refsis always there for the raw refs. - For hard targets, stack:
--headless=false+--proxy-server <residential>+ a solver. - Cross-origin iframes are opaque (as for any tool); same-origin iframes work fully.
--headless · --user-data-dir · --no-persist (throwaway profile; by default logins persist at <os config dir>/agent-browser, with an automatic fallback to a throwaway profile if it's locked by a leftover Chrome) · --proxy-server · --user-agent · --viewport W,H · --no-stealth · --no-eval (js on by default; disable to forbid arbitrary page JS) · --op-timeout (per-CDP-op, default 30s) · --idle-timeout (auto-close Chrome after this long idle, default 10m; 0 disables) · --allow-insecure-schemes · --version
MIT · Changelog · Example MCP configs · Benchmarks
Built for the agent that uses it.