Compact browser perception layer for AI agents. Converts web pages into token-efficient semantic snapshots that LLMs can reason about.
Takes raw HTML (thousands of DOM nodes, scripts, styles) and produces a clean, indented accessibility tree with stable ref IDs for interaction.
LLMs are bad at raw HTML. A typical web page has thousands of DOM nodes - scripts, style blocks, deeply nested divs, invisible elements - none of which help an agent understand what's on the page or how to interact with it. Feeding raw HTML to an LLM wastes tokens and confuses the model.
cortex-browser solves this by converting the DOM into a compact, structured accessibility tree that preserves the semantics an agent actually needs: what elements are on the page, what they do, and how to interact with them. The result is typically 90-95% fewer tokens than raw HTML, with no loss of actionable information.
Given a login page, cortex-browser produces:
page: "GitHub - Sign In" [github.com]
viewport: 0-900 of 1200px
---
heading[1] "Sign in to GitHub"
form @e56654:
textbox @e70050 "Username or email address" [required]
textbox @e91491 "Password" (password)
checkbox @e77747 "Keep me signed in" [unchecked]
button @e61466 "Sign in"
link @e75031 "Create an account" -> /signup
link @e93933 [offscreen] "Forgot password?" -> /password_reset
Interactive elements get @eN refs. The agent clicks by ref (click @e61466), types by ref (type @e70050 "user@example.com"), and reads structured output instead of raw HTML.
- 4-stage DOM pipeline: prune (scripts, hidden, aria-hidden) -> role mapping (ARIA semantics) -> wrapper collapse (meaningless divs) -> sibling merging (long lists)
- Stable ref IDs: Hash-based refs survive DOM mutations. An element with
id="submit"keeps the same ref across snapshots. - Multi-tab support: Open, switch, list, and close tabs. Each tab has independent state.
- Viewport-aware snapshots: Shows scroll position, marks off-screen elements
[offscreen], supports scroll_down/scroll_up/scroll_to_ref. - Page diff: After actions, get a compact diff instead of a full re-snapshot (
return_diff: trueor standalonepage_difftool). - Task context filtering: Focus snapshots on relevant regions (e.g., only form elements matching "login").
- Screenshot + annotations: Capture PNG screenshots of the current page. Optionally overlay interactive elements with red borders and
@eNlabels for visual debugging. Supports viewport and full-page modes. - Auth state persistence: Save and restore browser cookies as named profiles. Login sessions survive browser restarts — save once, restore anywhere.
- Structured data extraction: Pull tables, lists, and objects from pages as JSON using a JSON Schema — no LLM needed.
- Action recording & replay: Record browser action sequences and replay them deterministically using stored element locators.
- MCP server: Runs as a Model Context Protocol server over stdio or HTTP for agent integration.
- Incremental re-snapshots: DOM mutation observer skips re-processing when nothing changed.
Pre-built binaries for macOS, Linux, and Windows are available on the Releases page.
Download the latest release for your platform and add it to your PATH.
- Rust 1.70+ - install via rustup
- Chrome/Chromium - needed for live URL snapshots and MCP mode (not required for local HTML files)
# Clone and build
git clone https://github.com/igorsilveira/cortex-browser.git && cd cortex-browser
make install # installs to ~/.cargo/bin
# Or build without installing
make release # binary at target/release/cortex-browserNo Chrome needed for local files:
cortex-browser snapshot page.html
cortex-browser snapshot page.html -f json # JSON output
cat page.html | cortex-browser snapshot - # read from stdinStart Chrome with remote debugging enabled:
# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
# Linux
google-chrome --remote-debugging-port=9222Then snapshot any page:
cortex-browser snapshot https://example.com --port 9222Or let cortex-browser launch a headless Chrome for you:
cortex-browser snapshot https://example.com --launchThis is the main use case - cortex-browser acts as a tool server that AI agents connect to via the Model Context Protocol.
Two transport modes are available:
The agent framework starts cortex-browser as a subprocess and communicates over stdin/stdout using JSON-RPC. This is the standard approach for local MCP integrations.
cortex-browser mcp --launch # launch headless Chrome automatically
cortex-browser mcp --port 9222 # connect to an already-running Chromecortex-browser runs as a standalone HTTP server. Agents connect over HTTP using the MCP Streamable HTTP transport. This is useful for remote deployments, shared servers, and clients that support HTTP-based MCP.
cortex-browser mcp-http --launch # localhost:8080
cortex-browser mcp-http --launch --http-port 3000 # custom port
cortex-browser mcp-http --launch --host 0.0.0.0 # expose to network
cortex-browser mcp-http --port 9222 --http-port 3000 # connect to existing ChromeThe MCP endpoint is served at http://<host>:<http-port>/mcp. The server supports stateful sessions via SSE, so clients can maintain persistent connections with GET and receive streaming updates.
Add to your claude_desktop_config.json (stdio):
{
"mcpServers": {
"cortex-browser": {
"command": "cortex-browser",
"args": ["mcp", "--launch"]
}
}
}Add to your .mcp.json (stdio):
{
"mcpServers": {
"cortex-browser": {
"command": "cortex-browser",
"args": ["mcp", "--launch"]
}
}
}Or connect to a running HTTP server:
{
"mcpServers": {
"cortex-browser": {
"type": "streamable-http",
"url": "http://localhost:8080/mcp"
}
}
}Stdio: Start the process with cortex-browser mcp --launch and wire stdin/stdout to your MCP client transport.
HTTP: Start the server with cortex-browser mcp-http --launch, then point your MCP client at http://localhost:8080/mcp. The client should:
POSTJSON-RPC messages with headersContent-Type: application/jsonandAccept: application/json, text/event-stream- Read the
mcp-session-idheader from the response - Include
mcp-session-idin subsequent requests - Optionally
GETwithAccept: text/event-streamto open an SSE stream for server-initiated messages
Snapshots are compact accessibility trees. Each line is a node with its ARIA role, optional ref, name, and attributes:
heading[1] "Sign in to GitHub" ← h1 heading, not interactive
textbox @e70050 "Email" (email) ← input with ref 70050, type=email
button @e61466 "Sign in" ← clickable, use ref 61466
link @e93933 [offscreen] "Help" -> /h ← off-screen, has href
@eN- stable ref ID for interaction. Only interactive elements get refs.[offscreen]- element is outside the current viewport.(type)- input type when relevant (password, email, tel, etc.).-> url- link destination.[checked]/[unchecked]- checkbox/radio state.[required]/[disabled]- form control attributes.
Once connected via MCP, the agent has access to these tools:
Navigation & snapshots:
| Tool | Description |
|---|---|
navigate |
Open a URL and return the page snapshot |
snapshot |
Re-read the current page (returns cached version if DOM is unchanged) |
page_diff |
Compare current page to previous snapshot, showing only changes |
Element interaction (use the @eN ref number from the snapshot):
| Tool | Parameters | Description |
|---|---|---|
click |
ref, return_diff? |
Click an element |
type_text |
ref, text, return_diff? |
Type into an input field |
select_option |
ref, value, return_diff? |
Pick a dropdown option |
Setting return_diff: true on any interaction returns a compact diff instead of a full snapshot.
Scrolling:
| Tool | Description |
|---|---|
scroll_down / scroll_up |
Scroll by one viewport height |
scroll_to_ref |
Scroll a specific element into view |
Multi-tab management:
| Tool | Description |
|---|---|
open_tab |
Open a URL in a new tab |
list_tabs |
List all tabs with IDs and URLs |
switch_tab |
Switch to a tab by ID |
close_tab |
Close a tab by ID |
Focusing & filtering:
| Tool | Description |
|---|---|
set_task_context |
Focus subsequent snapshots on task-relevant content (persists until cleared) |
clear_task_context |
Remove the filter, show full page again |
focused_snapshot |
One-time filtered snapshot without changing persistent context |
wait_for_changes |
Block until the DOM changes (useful after async actions) |
Screenshot:
| Tool | Parameters | Description |
|---|---|---|
screenshot |
full_page?, annotate? |
Capture a PNG screenshot. annotate overlays @eN labels on interactive elements |
Auth state persistence:
| Tool | Parameters | Description |
|---|---|---|
get_cookies |
List cookies for the current page | |
save_auth |
profile |
Save current cookies to disk as a named profile |
restore_auth |
profile, domain? |
Inject saved cookies back into the browser |
list_auth |
domain? |
List saved auth profiles |
delete_auth |
profile, domain? |
Delete a saved profile |
Structured data extraction:
| Tool | Parameters | Description |
|---|---|---|
extract |
schema, selector? |
Extract structured JSON from the page using a JSON Schema |
Recording & replay:
| Tool | Parameters | Description |
|---|---|---|
start_recording |
name, description? |
Start capturing browser actions |
stop_recording |
Stop and save the recording to disk | |
replay_recording |
name, domain? |
Replay a saved recording deterministically |
list_recordings |
domain? |
List saved recordings |
delete_recording |
name, domain? |
Delete a saved recording |
A typical agent session looks like this:
Agent: navigate("https://github.com/login")
→ receives page snapshot with form fields
Agent: type_text(ref=70050, text="user@example.com")
→ receives updated snapshot
Agent: type_text(ref=91491, text="mypassword")
→ receives updated snapshot
Agent: click(ref=61466, return_diff=true)
→ receives compact diff showing the page transition
Agent: wait_for_changes(timeout_ms=3000)
→ receives snapshot of the logged-in dashboard
For large pages, use task context to filter snapshots:
Agent: set_task_context(task="fill login form", focus_roles=["textbox", "button"], interactive_only=true)
→ subsequent snapshots show only interactive elements
Agent: snapshot()
→ much smaller output, focused on form controls
Agent: clear_task_context()
→ back to full page snapshots
Or use focused_snapshot for a one-time filtered view without changing the persistent context.
Use return_diff: true on interactions and page_diff to see only what changed instead of re-reading the entire page.
A bundled script measures the token reduction between raw HTML and cortex-browser output:
# Run against the built-in test fixtures
make token-compare
# Run against specific files or URLs
python3 scripts/token-compare.py page.html https://example.comBy default the script runs against the test fixtures and a live fetch of wikipedia.org:
Source Raw HTML Cortex Ratio Saved
───────────────────────────────────────────────────────────────────────────────
blog.html 5.0k 1.0k 0.21x 79.1%
dashboard.html 7.5k 644 0.09x 91.4%
ecommerce.html 9.4k 1.2k 0.13x 87.4%
spa_app.html 4.5k 455 0.10x 89.8%
www.wikipedia.org 48.3k 2.0k 0.04x 95.9%
───────────────────────────────────────────────────────────────────────────────
TOTAL 74.6k 5.3k 0.07x 92.9%
Wikipedia's 48k tokens of raw HTML compress to ~2k tokens - a 95.9% reduction.
For accurate token counts, install tiktoken:
pip install tiktokenWithout it the script uses an approximate word/symbol counter.
cortex-browser uses structured logging via tracing. Logs go to stderr so they don't interfere with stdout output or the MCP protocol.
Control the log level with the RUST_LOG environment variable:
# Default (info) - tool invocations, navigation, tab operations
cortex-browser mcp --launch
# Debug - adds pipeline stats, cache hits/misses, DOM mutation counts
RUST_LOG=debug cortex-browser mcp --launch
# Debug for cortex-browser only (avoids noisy html5ever/selectors output)
RUST_LOG=cortex_browser=debug cortex-browser mcp --launch
# Quiet - only warnings and errors
RUST_LOG=warn cortex-browser snapshot page.htmlmake build # debug build
make release # optimized build
make test # run all tests
make lint # clippy
make install # install to ~/.cargo/binsrc/
main.rs CLI entry point (snapshot + mcp subcommands)
lib.rs Public modules
pipeline.rs 4-stage DOM processing pipeline
dom.rs Semantic tree types (PageSnapshot, SemanticNode, AriaRole)
serialize.rs Compact text serialization for LLM consumption
diff.rs Page diff algorithm (added/removed/modified)
extract.rs Schema-based structured data extraction
hints.rs Task context filtering and relevance scoring
mutation.rs DOM mutation observer + viewport JS
recording.rs Action recording types and RecordingStore
auth.rs Cookie persistence types and AuthStore
mcp.rs MCP server with multi-tab state management
browser.rs Chrome CDP connection and page fetching
tests/
integration.rs Integration tests
fixtures/ HTML fixtures (blog, dashboard, ecommerce, SPA)
cortex-browser processes pages through a 4-stage pipeline:
Raw HTML
→ Prune Remove scripts, styles, hidden elements, aria-hidden
→ Role map Assign ARIA semantics (button, textbox, link, heading, etc.)
→ Collapse Flatten meaningless wrapper divs
→ Merge Combine long sibling lists into summaries
Semantic snapshot
Each stage reduces noise while preserving the information an agent needs to understand and interact with the page.
Contributions are welcome! Please:
- Fork the repo and create a feature branch
- Make sure tests pass:
make test - Run the linter:
make lint - Submit a pull request
For bugs and feature requests, open an issue.
This project is licensed under the MIT License - see the LICENSE file for details.
