feat(browser): full-Chrome fallback integration + parallel sub-agent spawns#31
feat(browser): full-Chrome fallback integration + parallel sub-agent spawns#31raroque wants to merge 4 commits into
Conversation
…spawns Adds a "browser" integration that sub-agents can use when no native Composio toolkit covers the task. Wraps the agent-browser CLI as 10 MCP tools (open, snapshot, click, fill, press, get_text, get_url, wait, screenshot, close) with a single shared Chrome session pinned to a dedicated boop profile dir. Defaults to the user's real Chrome (auto-detected on macOS/Linux, overridable via BOOP_BROWSER_EXECUTABLE) instead of agent-browser's bundled Chrome for Testing — Cloudflare/Reddit fingerprint CfT trivially. Headed/headless is a runtime toggle persisted to settings.browser_headed (UI in the Settings tab). Dispatcher and execution-agent prompts gain a tool-selection priority block making "browser" a fallback only — for gmail/calendar/etc. the agent sticks with the native toolkit. Dispatcher also gets a "fan out" block telling it parallel spawn_agent calls in one turn run concurrently. Debug UI: BrowserSection card in Settings shows install status, an in-card "Show Chrome window" toggle, "Install Chrome for Testing" button (one-time fallback download), and a per-site "Open & sign in" helper that pops a real Chrome window using the boop profile. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR adds a real-Chrome browser integration (10 MCP tools via agent-browser CLI with CDP stealth patching), a pause-and-resume flow for login walls, parallel sub-agent dispatch in the dispatcher, and a cookie-import UI for lifting session cookies from the user's daily Chrome profile. Two P1 issues in the new
Confidence Score: 3/5Safe to merge for macOS-only deployments; two P1 regressions affect Linux users. Two P1 findings both in the new browser subsystem: the stealth-launcher Chrome fallback path and the macOS-only DAILY_CHROME_DIR break the integration on Linux. The rest of the PR is well-structured and correct. server/browser/stealth-launcher.ts (fallback binary), server/browser/cookies.ts (DAILY_CHROME_DIR) Important Files Changed
Sequence DiagramsequenceDiagram
participant D as Dispatcher
participant E as Execution Agent
participant P as pause-tools MCP
participant S as stealth-launcher
participant AB as agent-browser CLI
participant C as Chrome CDP :9222
D->>E: spawnExecutionAgent
E->>S: ensureStealthChrome()
S->>C: spawn binary (CHROME_PATH ?? macOS fallback)
S->>C: attachWebSocket + STEALTH_SCRIPT on every new doc
E->>AB: agent-browser --session boop --cdp 9222 open url
AB->>C: drives tab via CDP
AB-->>E: stdout result
E-->>D: SpawnResult completed
Note over E,P: If login wall hit
E->>P: pause_for_user(message, resume_task)
P->>D: sendImessage + persist pendingContinuation
P-->>E: pausedFlag.paused = true
E-->>D: SpawnResult paused
D-->>D: dispatcherSilent = true
Note over D: Next user message
D->>D: read pendingContinuation from Convex
D->>E: spawnExecutionAgent resume_task
|
- Drop browser_close tool (P1). With one shared --session boop daemon, a parallel browser-using sub-agent that finished first would close the session out from under any concurrently running agent. The server now owns the daemon lifecycle; agents can't close it. - Update execution-agent prompt accordingly: no longer instructs the agent to call browser_close at end-of-task. - Replace the misleading "wait briefly to surface launch errors" comment in /browser/login with an accurate description of the fire-and-forget behavior. - Move the runtime-config import in server/browser/config.ts to the top of the file alongside the other imports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| const hint = | ||
| r.stderr.includes("ENOENT") || r.exitCode === null | ||
| ? "\n\nIs agent-browser installed? Run `npx agent-browser install` once on this machine." | ||
| : ""; |
There was a problem hiding this comment.
Misleading install hint fires on timeout, not just missing binary
exitCode === null is true in two distinct cases: (a) agent-browser is not found (ENOENT), and (b) execa kills the process because the 30 s TIMEOUT_MS expired. Case (b) is a routine occurrence for browser_wait whenever the agent passes a duration ≥ 30 000 ms (e.g. "45000" for a slow load), or for any browser command on a sluggish page. In that scenario the agent sees:
[browser error] exit=null
Command timed out after 30000ms
Is agent-browser installed? Run `npx agent-browser install` once on this machine.
The hint is flat-out wrong; agent-browser is installed. The agent will either waste a turn trying to install it or give up on the task with a confusing diagnosis. Reserve the install hint for the ENOENT case only:
| const hint = | |
| r.stderr.includes("ENOENT") || r.exitCode === null | |
| ? "\n\nIs agent-browser installed? Run `npx agent-browser install` once on this machine." | |
| : ""; | |
| const hint = | |
| r.stderr.includes("ENOENT") | |
| ? "\n\nIs agent-browser installed? Run `npx agent-browser install` once on this machine." | |
| : ""; |
The underlying cause is also worth noting: TIMEOUT_MS = 30_000 is shared by browser_wait, but browser_wait accepts arbitrary millisecond durations with no stated upper bound. Any target ≥ 30 000 will always timeout silently. Consider either documenting a <30 s cap in the tool description, or giving browser_wait a higher per-call timeout (e.g., 90 s).
| const hint = | |
| r.stderr.includes("ENOENT") || r.exitCode === null | |
| ? "\n\nIs agent-browser installed? Run `npx agent-browser install` once on this machine." | |
| : ""; | |
| const hint = | |
| r.stderr.includes("ENOENT") | |
| ? "\n\nIs agent-browser installed? Run `npx agent-browser install` once on this machine." | |
| : ""; |
Adds a pause_for_user MCP tool exposed to execution agents. When a sub-agent
hits something that needs the human (login wall, OAuth/2FA, captcha, file
pick), it sends a friendly message to the user, persists a continuation,
and ends its turn cleanly. The dispatcher stays silent for that turn since
the message already went out.
On the user's next message, the dispatcher reads the pending continuation
and decides:
- Reply looks like readiness ("done", "logged in", "ok", or just pushing
forward like "what's the balance?") → spawn a fresh sub-agent with the
saved resume_task. Browser session/cookies persist across the spawn so
the resumed agent picks up where the first left off.
- Reply cancels or changes topic → call clear_pending_continuation and
proceed normally.
New schema: pendingContinuations table (one row per conversation, keyed by
conversationId). New convex/pendingContinuations.ts {get,set,clear}
mutations.
Execution-agent prompt updated to describe the pause flow and the specific
trigger conditions (login wall, OAuth, captcha, 2FA, etc.). Dispatcher
prompt gets a {{PENDING_CONTINUATION}} block telling it how to handle the
next user turn when a continuation is live.
Schema enum executionAgents.status gains "paused" so the agents table
distinguishes paused agents from completed/failed/cancelled.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…atcher anaphora fix - Stealth launcher spawns Chrome on a fixed CDP port and patches navigator.webdriver, languages, plugins, chrome.runtime, and Function.prototype.toString via Page.addScriptToEvaluateOnNewDocument before any site script runs. agent-browser attaches via --cdp instead of launching its own Chrome. - Cookie import scans the user's daily Chrome profile (Default, Profile 1, ...) for Google/LinkedIn/X/Reddit/GitHub session cookies, snapshot-reads them safely while Chrome holds the file, copies into boop's profile, and verifies via a CDP probe that distinguishes logged_in / needs_challenge / not_logged_in. - Convex cookieImports table tracks per-service imports with identity + verification state so the debug UI can show "Active as user@example.com · 4m ago". - New debug UI section under Settings → Browser surfaces logged-in sessions per daily profile with one-click Import/Refresh. - Dispatcher: bumped recent-history limit 10 → 30 and added a "Resolving references" block so anaphoric requests like "forward her the flight details" resolve from earlier conversation context (or ask) instead of defaulting to "most recent X". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
open,snapshot,click,fill,press,get_text,get_url,wait,screenshot,close), exposed to sub-agents only./Applications/Google Chrome.app/...on macOS and/usr/bin/google-chrome[-stable]/chromiumon Linux. Falls back to agent-browser's Chrome for Testing if neither is present. Overridable viaBOOP_BROWSER_EXECUTABLE.--headless=newis fingerprinted instantly by Cloudflare/Reddit/etc., so we drop it via theAGENT_BROWSER_HEADED=1env. Toggle is now a per-runtime setting (settings.browser_headed) with a switch inside the Settings tab's "Full browser use" card.browsera fallback for sites without a native Composio toolkit. If a Gmail/Slack/etc. task lands, the agent uses that toolkit, not the browser.spawn_agenttool_use blocks in one turn for independent sub-tasks. The runtime already supports concurrent agents (server/execution-agent.ts:10); this just unlocks the model side.send_ackcovers multiple parallel spawns now (no more triple-acks).server/browser/config.tsfor profile dir / session / executable / env so tools, routes, and the login script all stay in sync.Notable design choices
--session boopacross all sub-agents. agent-browser launches one Chrome per (--session, profile dir), and Chrome enforces one process per profile dir via SingletonLock — so per-agent sessions on a shared profile collide. v0 trades parallel-browser-isolation for reliability; commands queue through the daemon.~/.boop/agent-browser-profile. Separate from the user's daily Chrome profile, so no lock contention with their everyday browsing.browserfor tasks a native toolkit covers.Tradeoffs / known limits
Test plan
npm run typecheckcleannpm run devboots; server log shows[browser] using real Chrome at ...and[browser] registered ...browser_open+browser_snapshot, returns real content (verified pre-PR)["gmail"], NOT["browser"](verify in agent log)spawn_agentcalls in one turn (parallel verified by overlappingagent_spawnedevents)