Skip to content

stemado/oto

Repository files navigation

OTO

OTO

Browsing for Claude Code.


OTO lets Claude Code browse the web. Describe what you need done on a website and Claude opens a browser, scouts the page structure, figures out what's there, and does it.

No scripts. No selectors. You talk, Claude browses.

Check something quick

"Go to example.com and find out what plan I'm on"

Test your UI live

"Open localhost:3000 and tell me if the login form works after my changes"

Scrape some data

"Scout news.ycombinator.com and get me the top 10 post titles"

Record what happens

"Navigate through the checkout flow and record a video of the whole thing"

Claude opens a browser, figures out the page structure, and handles it. You direct next steps conversationally.

Install

Prerequisites

  1. Claude Code installed and working
  2. Python 3.11+ (python --version)
  3. uv (pip install uv)
  4. Google Chrome
  5. Node.js

Two commands

/plugin marketplace add stemado/oto
/plugin install oto@oto-marketplace

Restart Claude Code. Done.

Updating

/plugin marketplace update oto-marketplace
/plugin install oto@oto-marketplace

What you get

13 tools for browsing, observing, and acting:

Tool What it does
launch_session Open a browser (headed or headless, optional proxy)
scout_page_tool Structural page overview: iframes, shadow DOM, element counts (~200 tokens)
find_elements Search for elements by text, type, or CSS selector
execute_action_tool Click, type, select, navigate, scroll, hover, wait
fill_secret Type credentials from .env without exposing them in conversation
execute_javascript Run arbitrary JS in the page context
take_screenshot Capture the page for Claude to see
inspect_element Deep-inspect visibility, overlays, shadow DOM, ARIA
process_download Convert and move downloaded files
get_session_history Export the full session as a structured workflow log
monitor_network Watch HTTP traffic to discover API endpoints under the UI
record_video Record the browser session as MP4
close_session Close the browser and release resources

5 slash commands:

  • /oto:scout <url> — launch a browser and get a structural overview of any page
  • /oto:export [name] — export the current session as a replayable workflow package
  • /oto:schedule [list | name | delete name] — schedule an exported workflow to run automatically
  • /oto:report [bug|feature|friction] — file a GitHub issue with diagnostics
  • /oto:benchmark — run performance benchmarks

1 skill — OTO responds to phrases like "scout this page", "automate this site", "find the button", "explore this page"

1 hook — SessionStart dependency check (verifies Python, Node.js, Chrome)

Usage

Just describe what you want:

"Scout the registration page at university.edu/enroll"

Claude launches a browser, scouts the page structure, and shows you what it found. Then you direct next steps conversationally.

The scout-then-act pattern

This is how OTO works:

  1. Scout — get a compact structural overview (~200 tokens, not a raw DOM dump)
  2. Find — search for specific elements by text or type
  3. Act — click, type, navigate
  4. Scout again — see what changed

Scout report showing compact page overview
Scout reports are compact structural overviews, not raw DOM dumps.

Why scout instead of screenshot?

Most browser tools for AI work by taking screenshots and having the model interpret pixels. That costs about 6,000 tokens per screenshot and the model has to guess at selectors from what it sees.

OTO scouts the page structure directly via the DOM and returns a compact report (~200 tokens) with exact CSS selectors. Claude knows what's on the page, where it is, and how to interact with it.

At 200 tokens per scout, you stop deciding whether it's worth using. It becomes something you reach for the way you open a browser tab — without weighing the cost. That's the difference between a tool and a habit.

Export and schedule

You walk through a workflow once, conversationally. Then you capture it:

/export enrollment

OTO produces a self-contained directory at workflows/enrollment/ with:

  • A Python script using botasaurus-driver with human-like timing
  • A portable workflow JSON for use with any executor
  • requirements.txt and .env.example for credentials

Then schedule it:

/schedule enrollment

OTO detects your OS and creates the appropriate scheduled task — Windows Task Scheduler (\OTO\), macOS launchd (~/Library/LaunchAgents/), or Linux cron. Pick daily, weekly, weekdays, or one-time, set the time, done. The exported workflow runs unattended on your machine.

That's the full arc: explore a site interactively, export what you did, schedule it to repeat. One session gets you from "how does this work?" to "it runs every Monday at 6 AM."

Benchmarks

Task Wall-clock Tool calls Success vs screenshot-based
Fact lookup (Wikipedia) 11.0s 4.0 3/3 98x fewer tokens
Form fill + verify (httpbin) 25.2s 7.7 3/3 33x fewer tokens

Claude Sonnet 4.6, 3 runs each, wall-clock = browser time only (excludes model reasoning). Token efficiency ratios compare OTO's targeted extraction (~1,264–3,799 tokens) against full-page snapshots (~124,000 tokens). Full results: v0.2

How it works

OTO uses Botasaurus under the hood, which handles browser fingerprinting and detection evasion automatically. Sites see a normal browser session, not automation. This matters for real-world sites that actively block automation tools like Selenium and Playwright.

Passing browser automation detection tests
Standard automation detection tests pass.

Security

  • URL validation — blocks dangerous schemes (file://, javascript://), cloud metadata endpoints, and localhost (opt-in via OTO_ALLOW_LOCALHOST)
  • Path traversal protection — validates all file paths
  • Invisible character stripping — removes zero-width Unicode that could hide prompt injection
  • Content boundary markers — wraps web-sourced data to distinguish data from instructions
  • Credential isolationfill_secret reads from .env server-side; passwords never enter the conversation
  • Header redaction — Authorization, Cookie, and API key headers are scrubbed from network logs

License

MIT