Skip to content

feat(browser): full-Chrome fallback integration + parallel sub-agent spawns#31

Open
raroque wants to merge 4 commits into
mainfrom
boop-browser-integration
Open

feat(browser): full-Chrome fallback integration + parallel sub-agent spawns#31
raroque wants to merge 4 commits into
mainfrom
boop-browser-integration

Conversation

@raroque
Copy link
Copy Markdown
Owner

@raroque raroque commented Apr 29, 2026

Summary

  • New browser integration: 10 MCP tools wrapping the agent-browser CLI (open, snapshot, click, fill, press, get_text, get_url, wait, screenshot, close), exposed to sub-agents only.
  • Real Chrome by default — auto-detects /Applications/Google Chrome.app/... on macOS and /usr/bin/google-chrome[-stable]/chromium on Linux. Falls back to agent-browser's Chrome for Testing if neither is present. Overridable via BOOP_BROWSER_EXECUTABLE.
  • Headed by default — Chrome for Testing's --headless=new is fingerprinted instantly by Cloudflare/Reddit/etc., so we drop it via the AGENT_BROWSER_HEADED=1 env. Toggle is now a per-runtime setting (settings.browser_headed) with a switch inside the Settings tab's "Full browser use" card.
  • Dispatcher prompt updated:
    • "Tool selection priority" block makes browser a fallback for sites without a native Composio toolkit. If a Gmail/Slack/etc. task lands, the agent uses that toolkit, not the browser.
    • "Parallel spawning" block tells the dispatcher it can emit multiple spawn_agent tool_use blocks in one turn for independent sub-tasks. The runtime already supports concurrent agents (server/execution-agent.ts:10); this just unlocks the model side.
    • One send_ack covers multiple parallel spawns now (no more triple-acks).
  • Debug UI: BrowserSection card in Settings (moved out of Connections) — Chrome status indicator, "Show Chrome window" toggle, "Install Chrome for Testing" one-time button, and a "Log in to a site" helper that opens Chrome under the boop profile so cookies persist for future agent runs.
  • New centralized server/browser/config.ts for profile dir / session / executable / env so tools, routes, and the login script all stay in sync.

Notable design choices

  • Single shared --session boop across all sub-agents. agent-browser launches one Chrome per (--session, profile dir), and Chrome enforces one process per profile dir via SingletonLock — so per-agent sessions on a shared profile collide. v0 trades parallel-browser-isolation for reliability; commands queue through the daemon.
  • Dedicated profile dir at ~/.boop/agent-browser-profile. Separate from the user's daily Chrome profile, so no lock contention with their everyday browsing.
  • Browser is fallback-only. Reinforced in three places: integration registration description, dispatcher prompt, execution-agent prompt. The dispatcher is gated against passing browser for tasks a native toolkit covers.

Tradeoffs / known limits

  • Some sites (X.com is the worst offender) detect any CDP-controlled Chrome regardless of binary or headed mode. For those, the Composio toolkit is the right answer; the fallback exists for the long tail of sites with no native integration.
  • A real Chrome window pops up while sub-agents browse. This is intentional (much lower fingerprint), but visually noisy. If it bites, the headed toggle goes off in one click.

Test plan

  • npm run typecheck clean
  • npm run dev boots; server log shows [browser] using real Chrome at ... and [browser] registered ...
  • Settings tab shows "Full browser use" card with status, in-card headed toggle, install button, login input
  • Click "Open & sign in" with a URL — Chrome window opens under the boop profile
  • Send the dispatcher: "open hacker news /newest and tell me the top story" — sub-agent uses browser_open + browser_snapshot, returns real content (verified pre-PR)
  • Send a Gmail task — dispatcher passes ["gmail"], NOT ["browser"] (verify in agent log)
  • Send: "check gmail unreads AND summarize today's calendar" — dispatcher fans out two spawn_agent calls in one turn (parallel verified by overlapping agent_spawned events)
  • Toggle "Show Chrome window" off in Settings → next agent run launches Chrome headlessly within ~30s

…spawns

Adds a "browser" integration that sub-agents can use when no native Composio
toolkit covers the task. Wraps the agent-browser CLI as 10 MCP tools (open,
snapshot, click, fill, press, get_text, get_url, wait, screenshot, close)
with a single shared Chrome session pinned to a dedicated boop profile dir.

Defaults to the user's real Chrome (auto-detected on macOS/Linux, overridable
via BOOP_BROWSER_EXECUTABLE) instead of agent-browser's bundled Chrome for
Testing — Cloudflare/Reddit fingerprint CfT trivially. Headed/headless is a
runtime toggle persisted to settings.browser_headed (UI in the Settings tab).

Dispatcher and execution-agent prompts gain a tool-selection priority block
making "browser" a fallback only — for gmail/calendar/etc. the agent sticks
with the native toolkit. Dispatcher also gets a "fan out" block telling it
parallel spawn_agent calls in one turn run concurrently.

Debug UI: BrowserSection card in Settings shows install status, an in-card
"Show Chrome window" toggle, "Install Chrome for Testing" button (one-time
fallback download), and a per-site "Open & sign in" helper that pops a
real Chrome window using the boop profile.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 29, 2026

Greptile Summary

This PR adds a real-Chrome browser integration (10 MCP tools via agent-browser CLI with CDP stealth patching), a pause-and-resume flow for login walls, parallel sub-agent dispatch in the dispatcher, and a cookie-import UI for lifting session cookies from the user's daily Chrome profile.

Two P1 issues in the new server/browser/ layer:

  • spawnChrome hardcodes /Applications/Google Chrome.app/... as the fallback when no system Chrome is detected. On Linux with only Chrome for Testing installed, Chrome never starts, every browser_* call times out, and the agent sees a misleading "install agent-browser" hint.
  • DAILY_CHROME_DIR is macOS-only (~/Library/Application Support/Google/Chrome). On Linux, listDailyProfiles silently returns [], making cookie import entirely non-functional with no user-facing error.

Confidence Score: 3/5

Safe to merge for macOS-only deployments; two P1 regressions affect Linux users.

Two P1 findings both in the new browser subsystem: the stealth-launcher Chrome fallback path and the macOS-only DAILY_CHROME_DIR break the integration on Linux. The rest of the PR is well-structured and correct.

server/browser/stealth-launcher.ts (fallback binary), server/browser/cookies.ts (DAILY_CHROME_DIR)

Important Files Changed

Filename Overview
server/browser/stealth-launcher.ts New file — launches a real Chrome via CDP, injects stealth patches on every new document, and manages lifecycle. P1: hardcoded macOS fallback binary in spawnChrome breaks the integration on Linux when only Chrome for Testing is installed.
server/browser/cookies.ts New file — snapshots and queries the user's daily Chrome profile Cookies SQLite DB, then copies selected rows into the boop profile. P1: DAILY_CHROME_DIR is hardcoded to the macOS path; cookie import silently returns no profiles on Linux.
server/browser/tools.ts New file — exposes 9 browser_* MCP tools wrapping the agent-browser CLI. No browser_close tool is registered (correct — lifecycle is server-managed).
server/browser/config.ts New file — centralises profile dir, session name, CDP port, and getBrowserEnv(). Clean; detects real Chrome on macOS and Linux, falls back to null (issue surfaces in stealth-launcher).
server/browser-routes.ts New file — Express routes for browser status, install, cookie profile/scan/import, and the login helper. Properly guards cookie import against concurrent agents and validates URLs for the login endpoint.
server/pause-tools.ts New file — pause_for_user MCP tool that sends a user message, persists a continuation, and sets pausedFlag.paused. Clean implementation.
server/interaction-agent.ts Updated dispatcher: adds parallel spawn instructions, pending-continuation injection, and clear_pending_continuation tool.
server/execution-agent.ts Adds pause_for_user MCP server and promotes status to include "paused". Clean signal path via pausedFlag.
convex/schema.ts Adds pendingContinuations and cookieImports tables plus "paused" status literal. Consistent with new mutation/query files.
server/index.ts Mounts browser router and registers signal handlers to stop Chrome on server shutdown.

Sequence Diagram

sequenceDiagram
    participant D as Dispatcher
    participant E as Execution Agent
    participant P as pause-tools MCP
    participant S as stealth-launcher
    participant AB as agent-browser CLI
    participant C as Chrome CDP :9222

    D->>E: spawnExecutionAgent
    E->>S: ensureStealthChrome()
    S->>C: spawn binary (CHROME_PATH ?? macOS fallback)
    S->>C: attachWebSocket + STEALTH_SCRIPT on every new doc
    E->>AB: agent-browser --session boop --cdp 9222 open url
    AB->>C: drives tab via CDP
    AB-->>E: stdout result
    E-->>D: SpawnResult completed

    Note over E,P: If login wall hit
    E->>P: pause_for_user(message, resume_task)
    P->>D: sendImessage + persist pendingContinuation
    P-->>E: pausedFlag.paused = true
    E-->>D: SpawnResult paused
    D-->>D: dispatcherSilent = true

    Note over D: Next user message
    D->>D: read pendingContinuation from Convex
    D->>E: spawnExecutionAgent resume_task
Loading

Comments Outside Diff (2)

  1. server/browser/stealth-launcher.ts, line 2856-2858 (link)

    P1 Hardcoded macOS fallback breaks Chrome for Testing on Linux

    When CHROME_PATH is null (no system Chrome found), spawnChrome falls back to the macOS binary path regardless of platform. On a Linux system where only Chrome for Testing is installed via agent-browser install, spawn() receives a non-existent path, Chrome never starts, and waitForCdpEndpoint times out after 15 s. Every browser_* tool call then returns exitCode: null, which triggers the "Is agent-browser installed?" hint — pointing users at a completely wrong fix.

    The log in tools.ts says "falling back to Chrome for Testing" but the stealth launcher has no logic to actually locate Chrome for Testing's binary.

    You'll need to either resolve Chrome for Testing's installed binary path (e.g., by running agent-browser doctor and parsing the output) or skip the stealth-launcher entirely on the fallback path and let agent-browser manage its own Chrome lifecycle.

    Fix in Cursor Fix in Claude Code

  2. server/browser/cookies.ts, line 2305-2312 (link)

    P1 DAILY_CHROME_DIR is macOS-only — cookie import silently returns nothing on Linux

    The constant is hardcoded to ~/Library/Application Support/Google/Chrome, which is the macOS path. On Linux, Chrome stores its profiles under ~/.config/google-chrome (or ~/.config/chromium). Since listDailyProfiles starts with if (!existsSync(DAILY_CHROME_DIR)) return [], the feature silently produces zero profiles on Linux with no feedback to the user.

    The PR description mentions Linux support for the browser integration, so users installing on Linux who try to import cookies will see an empty profile list with no explanation.

    Fix in Cursor Fix in Claude Code

Fix All in Cursor Fix All in Claude Code

Reviews (4): Last reviewed commit: "feat(browser): stealth Chrome + cookie i..." | Re-trigger Greptile

Comment thread server/execution-agent.ts Outdated
Comment thread server/browser-routes.ts Outdated
Comment thread server/browser/config.ts Outdated
- Drop browser_close tool (P1). With one shared --session boop daemon, a
  parallel browser-using sub-agent that finished first would close the
  session out from under any concurrently running agent. The server now
  owns the daemon lifecycle; agents can't close it.
- Update execution-agent prompt accordingly: no longer instructs the
  agent to call browser_close at end-of-task.
- Replace the misleading "wait briefly to surface launch errors" comment
  in /browser/login with an accurate description of the fire-and-forget
  behavior.
- Move the runtime-config import in server/browser/config.ts to the
  top of the file alongside the other imports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread server/browser/tools.ts
Comment on lines +44 to +47
const hint =
r.stderr.includes("ENOENT") || r.exitCode === null
? "\n\nIs agent-browser installed? Run `npx agent-browser install` once on this machine."
: "";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Misleading install hint fires on timeout, not just missing binary

exitCode === null is true in two distinct cases: (a) agent-browser is not found (ENOENT), and (b) execa kills the process because the 30 s TIMEOUT_MS expired. Case (b) is a routine occurrence for browser_wait whenever the agent passes a duration ≥ 30 000 ms (e.g. "45000" for a slow load), or for any browser command on a sluggish page. In that scenario the agent sees:

[browser error] exit=null
Command timed out after 30000ms

Is agent-browser installed? Run `npx agent-browser install` once on this machine.

The hint is flat-out wrong; agent-browser is installed. The agent will either waste a turn trying to install it or give up on the task with a confusing diagnosis. Reserve the install hint for the ENOENT case only:

Suggested change
const hint =
r.stderr.includes("ENOENT") || r.exitCode === null
? "\n\nIs agent-browser installed? Run `npx agent-browser install` once on this machine."
: "";
const hint =
r.stderr.includes("ENOENT")
? "\n\nIs agent-browser installed? Run `npx agent-browser install` once on this machine."
: "";

The underlying cause is also worth noting: TIMEOUT_MS = 30_000 is shared by browser_wait, but browser_wait accepts arbitrary millisecond durations with no stated upper bound. Any target ≥ 30 000 will always timeout silently. Consider either documenting a <30 s cap in the tool description, or giving browser_wait a higher per-call timeout (e.g., 90 s).

Suggested change
const hint =
r.stderr.includes("ENOENT") || r.exitCode === null
? "\n\nIs agent-browser installed? Run `npx agent-browser install` once on this machine."
: "";
const hint =
r.stderr.includes("ENOENT")
? "\n\nIs agent-browser installed? Run `npx agent-browser install` once on this machine."
: "";

Fix in Cursor Fix in Claude Code

Adds a pause_for_user MCP tool exposed to execution agents. When a sub-agent
hits something that needs the human (login wall, OAuth/2FA, captcha, file
pick), it sends a friendly message to the user, persists a continuation,
and ends its turn cleanly. The dispatcher stays silent for that turn since
the message already went out.

On the user's next message, the dispatcher reads the pending continuation
and decides:
  - Reply looks like readiness ("done", "logged in", "ok", or just pushing
    forward like "what's the balance?") → spawn a fresh sub-agent with the
    saved resume_task. Browser session/cookies persist across the spawn so
    the resumed agent picks up where the first left off.
  - Reply cancels or changes topic → call clear_pending_continuation and
    proceed normally.

New schema: pendingContinuations table (one row per conversation, keyed by
conversationId). New convex/pendingContinuations.ts {get,set,clear}
mutations.

Execution-agent prompt updated to describe the pause flow and the specific
trigger conditions (login wall, OAuth, captcha, 2FA, etc.). Dispatcher
prompt gets a {{PENDING_CONTINUATION}} block telling it how to handle the
next user turn when a continuation is live.

Schema enum executionAgents.status gains "paused" so the agents table
distinguishes paused agents from completed/failed/cancelled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread server/interaction-agent.ts
…atcher anaphora fix

- Stealth launcher spawns Chrome on a fixed CDP port and patches navigator.webdriver,
  languages, plugins, chrome.runtime, and Function.prototype.toString via
  Page.addScriptToEvaluateOnNewDocument before any site script runs. agent-browser
  attaches via --cdp instead of launching its own Chrome.
- Cookie import scans the user's daily Chrome profile (Default, Profile 1, ...) for
  Google/LinkedIn/X/Reddit/GitHub session cookies, snapshot-reads them safely while
  Chrome holds the file, copies into boop's profile, and verifies via a CDP probe
  that distinguishes logged_in / needs_challenge / not_logged_in.
- Convex cookieImports table tracks per-service imports with identity + verification
  state so the debug UI can show "Active as user@example.com · 4m ago".
- New debug UI section under Settings → Browser surfaces logged-in sessions per
  daily profile with one-click Import/Refresh.
- Dispatcher: bumped recent-history limit 10 → 30 and added a "Resolving references"
  block so anaphoric requests like "forward her the flight details" resolve from
  earlier conversation context (or ask) instead of defaulting to "most recent X".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant