diff --git a/skills/steel-browser/SKILL.md b/skills/steel-browser/SKILL.md index 293b39e..ff46975 100644 --- a/skills/steel-browser/SKILL.md +++ b/skills/steel-browser/SKILL.md @@ -2,98 +2,69 @@ name: steel-browser allowed-tools: Bash(steel:*) description: >- - Use this skill by default for browser or web tasks that can run in the cloud: - site navigation, scraping, structured extraction, screenshots/PDFs, form - flows, and anti-bot-sensitive automation. Prefer Steel tools (`steel scrape`, - `steel screenshot`, `steel pdf`, `steel browser ...`) over generic - fetch/search approaches when reliability matters. Trigger even if the user - does not mention Steel. Skip only when the task must run against local-only - apps (for example localhost QA) or private network targets unavailable from - Steel cloud sessions. + Use this skill for any web task where WebFetch or curl would fail or be + insufficient — pages that require JavaScript to render, forms to fill and + submit, screenshots or PDFs of live pages, CAPTCHA/bot-protection bypass, + login flows, and multi-step browser navigation with persistent session state. + WebFetch returns empty HTML for JS-rendered pages; this skill runs a real + cloud browser that executes JavaScript, maintains cookies, clicks buttons, + and handles anti-bot measures. Trigger when the user wants you to actually + perform a web task (visit, interact, extract, capture) rather than just write + code for it. Skip only for: static pages a simple GET can fetch, localhost or + private-network targets, writing browser automation code the user will run + themselves, or conceptual questions about browser tools. --- # Steel Browser Skill -Steel gives agents cloud browser sessions, explicit lifecycle control, and -better anti-blocking options than ad-hoc local browser automation. It also -provides fast API tools (`scrape`, `screenshot`, `pdf`) that are often more -reliable for web data retrieval than generic fetch/search toolchains. +Steel gives agents cloud browser sessions and fast API tools (`scrape`, `screenshot`, `pdf`). -## Trigger rules +## Choose the right tool first -Trigger aggressively when the user asks for: +| Goal | Tool | +|------|------| +| Extract text/HTML from a mostly-static page | `steel scrape ` | +| One-shot screenshot or PDF | `steel screenshot ` / `steel pdf ` | +| Multi-step interaction, login, forms, JS-heavy pages | `steel browser start` + interaction loop | +| Anti-bot / CAPTCHA sites | `steel browser start --stealth` | -- Website interaction (click/fill/login/multi-step navigation). -- Web extraction or collection from dynamic pages. -- Screenshot or PDF capture of webpages. -- Browser automation that may hit bot checks/CAPTCHAs. -- Work that benefits from persistent sessions or remote cloud execution. -- Existing `agent-browser` command migration. - -Do not trigger when task scope is clearly local-only: - -- Localhost QA of a dev server running only on the user's machine. -- Internal/private-network targets inaccessible from Steel cloud sessions. -- Browser debugging that explicitly must attach to a local user browser. +Start with `steel scrape` when you only need page content. Escalate to `steel browser` when the page requires interaction or JavaScript rendering. ## Core workflow -Follow this sequence: - -1. Choose command family: - extraction (`steel scrape`) or interaction (`steel browser`). -2. For interactive work, start/attach a named session. -3. Inspect page state (`snapshot`), then interact in small steps. -4. Re-snapshot after meaningful DOM changes/navigation. -5. Verify with `wait`, `get ...`, `snapshot`, or screenshot/PDF output. -6. Stop sessions when done unless user asks to keep them running. - -### Extraction playbook +1. **Start** a named session (use `--session-timeout 3600000` for tasks over 5 min) +2. **Navigate** to the target URL +3. **Snapshot** to get current page state and element refs +4. **Interact** using refs from the snapshot (click, fill, select, etc.) +5. **Re-snapshot** after every navigation or meaningful DOM change +6. **Verify** state with `wait`, `get`, or another snapshot +7. **Stop** the session when done ```bash -steel scrape https://example.com --format markdown -steel scrape https://example.com --format markdown,html --use-proxy +steel browser start --session my-task --session-timeout 3600000 +steel browser open https://example.com --session my-task +steel browser snapshot -i --session my-task +steel browser fill @e3 "search term" --session my-task +steel browser click @e7 --session my-task +steel browser wait --load networkidle --session my-task +steel browser snapshot -i --session my-task +steel browser stop --session my-task ``` -### Interactive playbook - -```bash -steel browser start --session my-task -steel browser open -steel browser snapshot -# click/fill/wait/get/snapshot loop -steel browser stop -``` - -### Parallel sessions playbook - -```bash -# Start multiple independent sessions -steel browser start --session job-a -steel browser start --session job-b - -# Each session runs an isolated Steel cloud browser -steel browser open https://site-a.com --session job-a -steel browser open https://site-b.com --session job-b +**RULE: Never use an `@eN` ref without a fresh snapshot. Refs expire after navigation or DOM changes.** -steel browser snapshot --session job-a -steel browser snapshot --session job-b +Element refs from `snapshot -i` are more reliable than CSS selectors — always prefer them. Use `snapshot -i -c` for large DOMs, or `-d 3` to limit depth. -# Clean up -steel browser stop --session job-a -steel browser stop --session job-b -``` - -Each named session maps to an isolated Steel cloud browser. Commands are routed by session name and do not interfere. +Always use the same `--session ` on every command in a workflow. The `--session` flag takes the name string you chose, not the UUID from JSON output. ## Essential commands -Use these directly before opening full references. - -### Session lifecycle (interactive flows) +### Session lifecycle ```bash -steel browser start --session +steel browser start --session --session-timeout 3600000 +steel browser start --session --stealth +steel browser start --session --proxy steel browser sessions steel browser live --session steel browser stop --session @@ -103,189 +74,152 @@ steel browser stop --all ### Navigation and inspection ```bash -steel browser open -steel browser snapshot # all elements (default) -steel browser snapshot -i # interactive elements only -steel browser snapshot -c # compact output -steel browser get url -steel browser get title -steel browser get text -``` - -### Interaction and sync - -```bash -steel browser click -steel browser fill "text" -steel browser press Enter -steel browser select "value" -steel browser drag @e1 @e2 -steel browser upload @e1 file.pdf -steel browser highlight @e1 -steel browser scroll down 500 # positional: direction amount -steel browser scroll up -steel browser scrollintoview @e1 -steel browser wait --load-state networkidle -steel browser wait --selector --state visible -steel browser wait --text "Success" +steel browser open --session +steel browser snapshot # full accessibility tree +steel browser snapshot -i # interactive elements + refs +steel browser snapshot -c # compact output +steel browser snapshot -i -c -d 3 # combine flags +steel browser get url --session +steel browser get title --session +steel browser get text @e1 --session +steel browser back --session +steel browser forward --session +steel browser reload --session ``` -**Form filling strategy:** For simple text inputs, use `fill`. For complex forms -(React controlled inputs, date pickers, HTML5 widgets, cascading selects), -prefer `eval` with direct DOM manipulation: +### Interaction ```bash -steel browser eval "document.querySelector('#date').value = '2026-03-19'" -steel browser eval "document.querySelector('#email').value = 'user@test.com'; document.querySelector('#email').dispatchEvent(new Event('input', {bubbles:true}))" +steel browser click @e1 --session +steel browser dblclick @e1 --session +steel browser fill @e2 "value" --session +steel browser type @e2 "value" --delay 50 --session +steel browser press Enter --session +steel browser press Control+a --session +steel browser hover @e1 --session +steel browser click @e1 --session # prefer click for checkboxes +steel browser select @e1 "option" --session +steel browser scroll down 500 --session +steel browser scrollintoview @e1 --session +steel browser drag @e1 @e2 --session +steel browser tab new --session +steel browser tab switch 2 --session +steel browser tab list --session +steel browser tab close --session ``` -### Cookies and storage +### Synchronization ```bash -steel browser cookies # list all cookies -steel browser cookies set # set a cookie -steel browser cookies set --domain .example.com -steel browser cookies clear # clear all cookies -steel browser storage local # get all localStorage -steel browser storage local # get specific key -steel browser storage local set # set value -steel browser storage local clear # clear all -steel browser storage session # sessionStorage (same subcommands) +steel browser wait --load networkidle --session +steel browser wait --selector ".loaded" --state visible --session +steel browser wait -t "Success" --session +steel browser wait -u "/dashboard" --session ``` -### Browser settings +### Extraction ```bash -steel browser set viewport 1920 1080 # set viewport size -steel browser set geo 37.7749 -122.4194 # set geolocation -steel browser set offline on # toggle offline mode (on/off) -steel browser set headers '{"X-Key":"value"}' # set extra HTTP headers -steel browser set useragent "Custom UA string" # set user agent +steel browser get text @e1 --session +steel browser get html @e1 --session +steel browser get value @e1 --session +steel browser get attr @e1 href --session +steel browser get count ".item" --session +steel browser content --session +steel browser eval "document.querySelectorAll('a').length" --session +steel browser find ".item" --session ``` -### CAPTCHA and anti-bot +### Screenshots and capture ```bash -steel browser start --session --stealth -steel browser captcha status --wait -steel browser captcha solve --session +steel browser screenshot -o ./page.png --session +steel browser screenshot --full --session +steel browser screenshot --selector "#chart" --session ``` -### Credentials +Top-level `steel screenshot ` and `steel pdf ` are stateless one-shot API calls — they do not take `--session`, `--stealth`, or `-o` flags. Use `steel browser screenshot` for in-session captures. -Manage stored credentials and inject them into sessions via `steel credentials` commands and `--namespace`/`--credentials` flags on `steel browser start`. See [references/steel-browser-lifecycle.md](references/steel-browser-lifecycle.md) for flag details. - -For exhaustive command families, read -[references/steel-browser-commands.md](references/steel-browser-commands.md). - -### API tools (fast extraction/artifacts) +### Cookies and storage ```bash -steel scrape -steel scrape --format markdown,html --use-proxy -steel screenshot -steel pdf +steel browser cookies --session # list all cookies +steel browser cookies set --session # set a cookie +steel browser cookies set --domain .example.com # with domain +steel browser cookies clear --session # clear all cookies + +steel browser storage local --session # get all localStorage +steel browser storage local --session # get one key +steel browser storage local set --session # set a value +steel browser storage local clear --session # clear localStorage +steel browser storage session --session # sessionStorage (same API) ``` -`steel screenshot` and `steel pdf` are stateless API tools (no browser session). -To save the output file, use `--json` and download the URL: +### Browser settings ```bash -steel screenshot https://example.com --json | jq -r '.url' | xargs curl -sL -o screenshot.png -steel pdf https://example.com --json | jq -r '.url' | xargs curl -sL -o page.pdf +steel browser set viewport 1920 1080 --session +steel browser set geo 37.7749 -122.4194 --session +steel browser set offline on --session +steel browser set useragent "Custom UA" --session ``` -## Mode and session rules +Note: `steel browser set headers` panics in steel 0.3.2 (clap bug). Use `eval` for header injection. -- Default to cloud mode. -- Use self-hosted mode only if user specifies `--local`, `--api-url`, or - self-hosted infra. -- Keep one mode per workflow. -- Prefer `--session ` across all commands in a single run. -- Parse and preserve session `id` from `steel browser start` for stable - downstream automation. -- Treat `connect_url` as display metadata, not a raw secret-bearing URL. +### CAPTCHA -Read -[references/steel-browser-lifecycle.md](references/steel-browser-lifecycle.md) -for full lifecycle and endpoint precedence details. - -## Migration behavior +```bash +steel browser start --session --stealth +steel browser captcha status --wait --session +steel browser captcha solve --session +``` -When users provide `agent-browser` commands or scripts: +## eval for capability gaps -1. Convert command prefix from `agent-browser` to `steel browser`. -2. Preserve original behavior intent. -3. Add Steel lifecycle commands (`start`, `stop`, `sessions`, `live`) when explicit session control is needed. +Use `steel browser eval "" --session ` when no direct command exists. Common uses: network interception (fetch monkey-patch), cookie manipulation beyond the `cookies` command, DOM state injection, complex form widgets (React date pickers, autocomplete inputs). -Read [references/migration-agent-browser.md](references/migration-agent-browser.md). +For detailed eval patterns (drag fallback, network interception, file upload injection), see [references/steel-browser-commands.md](references/steel-browser-commands.md). -## Troubleshooting quick matrix (abbreviated) +## Commands that do NOT exist -Start diagnostics with: +Do not attempt these. They will fail and waste turns. + +| Does NOT exist | Use instead | +|---|---| +| `steel browser record` / `video` | No recording; use `steel browser live` for viewer URL | +| `steel browser triple-click` | `press Control+a` or `eval` with `.select()` | +| `steel browser network` / `route` | `eval` with fetch monkey-patch | +| `steel browser console` / `errors` | `eval` with console interceptor | +| `steel browser frame` | `eval` with iframe contentDocument | +| `steel browser tabs` (plural) | `steel browser tab list` (singular `tab`) | +| `steel browser execute` / `run` | `steel browser eval` | +| `steel browser is-visible` | `steel browser is` | +| `steel browser set device` | `set viewport` + `set useragent` separately | +| `steel browser resize` | `steel browser set viewport W H` | +| `steel browser geolocation` | `steel browser set geo LAT LON` | +| `steel browser pdf` | Top-level `steel pdf ` (no browser session needed) | + +## Troubleshooting ```bash steel browser sessions -steel browser live +steel browser live --session ``` -Then apply targeted fixes: - -- Missing auth (`Missing browser auth...`): - run `steel login` or set `STEEL_API_KEY`. -- Session not being reused: - enforce the exact same `--session ` and keep mode consistent. -- CAPTCHA block: - check `steel browser captcha status --wait`, - run `steel browser captcha solve --session ` for manual mode, - or restart with `--stealth` and/or proxy settings. -- Self-hosted/local unreachable: - verify `--api-url`/`--local` path, then `steel dev install && steel dev start` - for local runtime. -- Stale session state: - `steel browser stop --all` then restart with a fresh named session. -- `steel: command not found`: - install the native binary with `curl -LsSf https://setup.steel.dev | sh` - and add `$HOME/.steel/bin` to your PATH. - -If issue persists, use the full playbook: -[references/troubleshooting.md](references/troubleshooting.md). +| Symptom | Fix | +|---------|-----| +| `Missing browser auth` | `steel login` or set `STEEL_API_KEY` | +| `No running session ""` | Check spelling; run `steel browser stop --all` then restart with same name | +| Session not reused between commands | Use exact same `--session ` on every command | +| CAPTCHA blocking | `steel browser captcha status --wait`; restart with `--stealth` | +| Stale session / stuck state | `steel browser stop --all` then fresh named session | +| `steel: command not found` | `curl -LsSf https://setup.steel.dev \| sh` and add `$HOME/.steel/bin` to PATH | -## Commands that do NOT exist +Full playbook: [references/troubleshooting.md](references/troubleshooting.md). + +## Reference routing -The following commands do **not** exist. Do not attempt them. Use `eval` workarounds instead. - -| Does NOT exist | Use instead | -| --------------------------------------------- | --------------------------------------------------------------------- | -| `steel browser network` / `route` / `unroute` | `eval` with fetch monkey-patch or Performance API | -| `steel browser record` / `video` | Not available — no recording support | -| `steel browser console` / `errors` | `eval` with `console.log` interceptor | -| `steel browser frame` | `eval` with `document.querySelector('iframe').contentDocument` | -| `steel browser set device` | `set viewport` + `set useragent` | -| `steel browser set media` | `eval` with `window.matchMedia` override | -| `steel browser set credentials` | Use `steel credentials` commands + `--credentials` flag on start | -| `steel browser geolocation` | `steel browser set geo ` | -| `steel browser mouse` | `eval` with `Input.dispatchMouseEvent` via CDP or MouseEvent dispatch | -| `steel browser tabs` | `steel browser tab list` (singular `tab`, not `tabs`) | -| `steel browser execute` / `run` | `steel browser eval` | - -## Guardrails - -- Do not print or request raw API keys in command output. -- Do not mix cloud and local mode in one flow unless explicitly transitioning. -- Do not assume an existing active session without checking. -- Prefer Steel web tools over native fetch/search for remote web tasks when - reliability or anti-bot handling matters. -- For inherited command uncertainty, use `steel browser --help`. -- There is no top-level `steel browser extract` command; use `steel browser get ...`, `steel browser snapshot`, and `steel browser find ...` instead. - -## Reference routing table - -- Lifecycle, endpoint precedence, attach rules: - [references/steel-browser-lifecycle.md](references/steel-browser-lifecycle.md) -- Complete command families and examples: - [references/steel-browser-commands.md](references/steel-browser-commands.md) -- Migration from upstream command usage: - [references/migration-agent-browser.md](references/migration-agent-browser.md) -- Error handling and recovery playbooks: - [references/troubleshooting.md](references/troubleshooting.md) +- Full command reference: [references/steel-browser-commands.md](references/steel-browser-commands.md) +- Lifecycle, endpoints, CAPTCHA modes: [references/steel-browser-lifecycle.md](references/steel-browser-lifecycle.md) +- Migration from agent-browser: [references/migration-agent-browser.md](references/migration-agent-browser.md) +- Error recovery playbooks: [references/troubleshooting.md](references/troubleshooting.md) diff --git a/skills/steel-browser/evals/evals.json b/skills/steel-browser/evals/evals.json new file mode 100644 index 0000000..a2bb209 --- /dev/null +++ b/skills/steel-browser/evals/evals.json @@ -0,0 +1,43 @@ +{ + "skill_name": "steel-browser", + "evals": [ + { + "id": 1, + "prompt": "Go to https://quotes.toscrape.com and extract all the quotes from the first page along with their authors and tags. Save the results as a JSON file called quotes.json with each entry having 'text', 'author', and 'tags' fields.", + "expected_output": "A quotes.json file containing all quotes from the first page with text, author, and tags fields. The agent should try steel scrape first before opening a browser session. If a session is opened, it should be stopped when done.", + "files": [], + "expectations": [ + "Agent uses a steel command (not curl/fetch/wget) to access the page", + "quotes.json file is created and contains valid JSON", + "quotes.json entries have 'text', 'author', and 'tags' fields", + "quotes.json contains at least 5 quotes", + "If a browser session was started, it is stopped before the task ends" + ] + }, + { + "id": 2, + "prompt": "Fill out the form at https://httpbin.org/forms/post: set the customer name field to 'TestUser', select 'Medium' for pizza size, check the 'Bacon' topping, enter 'test order comment' in the comments field, then submit the form. Show me the response JSON that comes back.", + "expected_output": "Agent navigates to the form, fills in all requested fields, submits the form, and displays/saves the response JSON. Session should be started with a named --session flag (not a UUID), and stopped when done.", + "files": [], + "expectations": [ + "Agent starts a browser session with steel browser start", + "Session name used is a human-readable string (not a UUID pattern like xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)", + "Agent fills the custname field with 'TestUser' (visible in response or transcript)", + "Agent saves or displays the form submission response", + "Agent stops the browser session before finishing" + ] + }, + { + "id": 3, + "prompt": "Go to https://the-internet.herokuapp.com/drag_and_drop — there are two boxes labeled A and B. Drag box A onto box B so they swap. Take a screenshot after the drag to confirm the swap worked.", + "expected_output": "Agent correctly implements drag-and-drop using eval with dispatchEvent (since steel browser drag doesn't exist). Should NOT try steel browser drag. A screenshot is taken showing the post-drag state.", + "files": [], + "expectations": [ + "Agent does NOT attempt a command named 'steel browser drag'", + "Agent uses steel browser eval to perform the drag interaction", + "Agent takes a screenshot after the drag attempt", + "Agent stops the browser session before finishing" + ] + } + ] +}