feat(browser): full-Chrome fallback integration + parallel sub-agent spawns by raroque · Pull Request #31 · raroque/boop-agent

raroque · 2026-04-29T04:34:29Z

Summary

New browser integration: 10 MCP tools wrapping the agent-browser CLI (open, snapshot, click, fill, press, get_text, get_url, wait, screenshot, close), exposed to sub-agents only.
Real Chrome by default — auto-detects /Applications/Google Chrome.app/... on macOS and /usr/bin/google-chrome[-stable]/chromium on Linux. Falls back to agent-browser's Chrome for Testing if neither is present. Overridable via BOOP_BROWSER_EXECUTABLE.
Headed by default — Chrome for Testing's --headless=new is fingerprinted instantly by Cloudflare/Reddit/etc., so we drop it via the AGENT_BROWSER_HEADED=1 env. Toggle is now a per-runtime setting (settings.browser_headed) with a switch inside the Settings tab's "Full browser use" card.
Dispatcher prompt updated:
- "Tool selection priority" block makes browser a fallback for sites without a native Composio toolkit. If a Gmail/Slack/etc. task lands, the agent uses that toolkit, not the browser.
- "Parallel spawning" block tells the dispatcher it can emit multiple spawn_agent tool_use blocks in one turn for independent sub-tasks. The runtime already supports concurrent agents (server/execution-agent.ts:10); this just unlocks the model side.
- One send_ack covers multiple parallel spawns now (no more triple-acks).
Debug UI: BrowserSection card in Settings (moved out of Connections) — Chrome status indicator, "Show Chrome window" toggle, "Install Chrome for Testing" one-time button, and a "Log in to a site" helper that opens Chrome under the boop profile so cookies persist for future agent runs.
New centralized server/browser/config.ts for profile dir / session / executable / env so tools, routes, and the login script all stay in sync.

Notable design choices

Single shared --session boop across all sub-agents. agent-browser launches one Chrome per (--session, profile dir), and Chrome enforces one process per profile dir via SingletonLock — so per-agent sessions on a shared profile collide. v0 trades parallel-browser-isolation for reliability; commands queue through the daemon.
Dedicated profile dir at ~/.boop/agent-browser-profile. Separate from the user's daily Chrome profile, so no lock contention with their everyday browsing.
Browser is fallback-only. Reinforced in three places: integration registration description, dispatcher prompt, execution-agent prompt. The dispatcher is gated against passing browser for tasks a native toolkit covers.

Tradeoffs / known limits

Some sites (X.com is the worst offender) detect any CDP-controlled Chrome regardless of binary or headed mode. For those, the Composio toolkit is the right answer; the fallback exists for the long tail of sites with no native integration.
A real Chrome window pops up while sub-agents browse. This is intentional (much lower fingerprint), but visually noisy. If it bites, the headed toggle goes off in one click.

Test plan

npm run typecheck clean
npm run dev boots; server log shows [browser] using real Chrome at ... and [browser] registered ...
Settings tab shows "Full browser use" card with status, in-card headed toggle, install button, login input
Click "Open & sign in" with a URL — Chrome window opens under the boop profile
Send the dispatcher: "open hacker news /newest and tell me the top story" — sub-agent uses browser_open + browser_snapshot, returns real content (verified pre-PR)
Send a Gmail task — dispatcher passes ["gmail"], NOT ["browser"] (verify in agent log)
Send: "check gmail unreads AND summarize today's calendar" — dispatcher fans out two spawn_agent calls in one turn (parallel verified by overlapping agent_spawned events)
Toggle "Show Chrome window" off in Settings → next agent run launches Chrome headlessly within ~30s

…spawns Adds a "browser" integration that sub-agents can use when no native Composio toolkit covers the task. Wraps the agent-browser CLI as 10 MCP tools (open, snapshot, click, fill, press, get_text, get_url, wait, screenshot, close) with a single shared Chrome session pinned to a dedicated boop profile dir. Defaults to the user's real Chrome (auto-detected on macOS/Linux, overridable via BOOP_BROWSER_EXECUTABLE) instead of agent-browser's bundled Chrome for Testing — Cloudflare/Reddit fingerprint CfT trivially. Headed/headless is a runtime toggle persisted to settings.browser_headed (UI in the Settings tab). Dispatcher and execution-agent prompts gain a tool-selection priority block making "browser" a fallback only — for gmail/calendar/etc. the agent sticks with the native toolkit. Dispatcher also gets a "fan out" block telling it parallel spawn_agent calls in one turn run concurrently. Debug UI: BrowserSection card in Settings shows install status, an in-card "Show Chrome window" toggle, "Install Chrome for Testing" button (one-time fallback download), and a per-site "Open & sign in" helper that pops a real Chrome window using the boop profile. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-04-29T04:37:39Z

Greptile Summary

This PR adds a real-Chrome browser integration (10 MCP tools via agent-browser CLI with CDP stealth patching), a pause-and-resume flow for login walls, parallel sub-agent dispatch in the dispatcher, and a cookie-import UI for lifting session cookies from the user's daily Chrome profile.

Two P1 issues in the new server/browser/ layer:

spawnChrome hardcodes /Applications/Google Chrome.app/... as the fallback when no system Chrome is detected. On Linux with only Chrome for Testing installed, Chrome never starts, every browser_* call times out, and the agent sees a misleading "install agent-browser" hint.
DAILY_CHROME_DIR is macOS-only (~/Library/Application Support/Google/Chrome). On Linux, listDailyProfiles silently returns [], making cookie import entirely non-functional with no user-facing error.

Confidence Score: 3/5

Safe to merge for macOS-only deployments; two P1 regressions affect Linux users.

Two P1 findings both in the new browser subsystem: the stealth-launcher Chrome fallback path and the macOS-only DAILY_CHROME_DIR break the integration on Linux. The rest of the PR is well-structured and correct.

server/browser/stealth-launcher.ts (fallback binary), server/browser/cookies.ts (DAILY_CHROME_DIR)

Important Files Changed

Filename	Overview
server/browser/stealth-launcher.ts	New file — launches a real Chrome via CDP, injects stealth patches on every new document, and manages lifecycle. P1: hardcoded macOS fallback binary in `spawnChrome` breaks the integration on Linux when only Chrome for Testing is installed.
server/browser/cookies.ts	New file — snapshots and queries the user's daily Chrome profile Cookies SQLite DB, then copies selected rows into the boop profile. P1: `DAILY_CHROME_DIR` is hardcoded to the macOS path; cookie import silently returns no profiles on Linux.
server/browser/tools.ts	New file — exposes 9 browser_* MCP tools wrapping the agent-browser CLI. No `browser_close` tool is registered (correct — lifecycle is server-managed).
server/browser/config.ts	New file — centralises profile dir, session name, CDP port, and `getBrowserEnv()`. Clean; detects real Chrome on macOS and Linux, falls back to null (issue surfaces in stealth-launcher).
server/browser-routes.ts	New file — Express routes for browser status, install, cookie profile/scan/import, and the login helper. Properly guards cookie import against concurrent agents and validates URLs for the login endpoint.
server/pause-tools.ts	New file — `pause_for_user` MCP tool that sends a user message, persists a continuation, and sets `pausedFlag.paused`. Clean implementation.
server/interaction-agent.ts	Updated dispatcher: adds parallel spawn instructions, pending-continuation injection, and `clear_pending_continuation` tool.
server/execution-agent.ts	Adds `pause_for_user` MCP server and promotes `status` to include `"paused"`. Clean signal path via `pausedFlag`.
convex/schema.ts	Adds `pendingContinuations` and `cookieImports` tables plus `"paused"` status literal. Consistent with new mutation/query files.
server/index.ts	Mounts browser router and registers signal handlers to stop Chrome on server shutdown.

Sequence Diagram

sequenceDiagram
    participant D as Dispatcher
    participant E as Execution Agent
    participant P as pause-tools MCP
    participant S as stealth-launcher
    participant AB as agent-browser CLI
    participant C as Chrome CDP :9222

    D->>E: spawnExecutionAgent
    E->>S: ensureStealthChrome()
    S->>C: spawn binary (CHROME_PATH ?? macOS fallback)
    S->>C: attachWebSocket + STEALTH_SCRIPT on every new doc
    E->>AB: agent-browser --session boop --cdp 9222 open url
    AB->>C: drives tab via CDP
    AB-->>E: stdout result
    E-->>D: SpawnResult completed

    Note over E,P: If login wall hit
    E->>P: pause_for_user(message, resume_task)
    P->>D: sendImessage + persist pendingContinuation
    P-->>E: pausedFlag.paused = true
    E-->>D: SpawnResult paused
    D-->>D: dispatcherSilent = true

    Note over D: Next user message
    D->>D: read pendingContinuation from Convex
    D->>E: spawnExecutionAgent resume_task

Comments Outside Diff (2)

server/browser/stealth-launcher.ts, line 2856-2858 (link)

Hardcoded macOS fallback breaks Chrome for Testing on Linux

When CHROME_PATH is null (no system Chrome found), spawnChrome falls back to the macOS binary path regardless of platform. On a Linux system where only Chrome for Testing is installed via agent-browser install, spawn() receives a non-existent path, Chrome never starts, and waitForCdpEndpoint times out after 15 s. Every browser_* tool call then returns exitCode: null, which triggers the "Is agent-browser installed?" hint — pointing users at a completely wrong fix.

The log in tools.ts says "falling back to Chrome for Testing" but the stealth launcher has no logic to actually locate Chrome for Testing's binary.

You'll need to either resolve Chrome for Testing's installed binary path (e.g., by running agent-browser doctor and parsing the output) or skip the stealth-launcher entirely on the fallback path and let agent-browser manage its own Chrome lifecycle.
server/browser/cookies.ts, line 2305-2312 (link)

DAILY_CHROME_DIR is macOS-only — cookie import silently returns nothing on Linux

The constant is hardcoded to ~/Library/Application Support/Google/Chrome, which is the macOS path. On Linux, Chrome stores its profiles under ~/.config/google-chrome (or ~/.config/chromium). Since listDailyProfiles starts with if (!existsSync(DAILY_CHROME_DIR)) return [], the feature silently produces zero profiles on Linux with no feedback to the user.

The PR description mentions Linux support for the browser integration, so users installing on Linux who try to import cookies will see an empty profile list with no explanation.

_{Reviews (4): Last reviewed commit: "feat(browser): stealth Chrome + cookie i..." | Re-trigger Greptile}

- Drop browser_close tool (P1). With one shared --session boop daemon, a parallel browser-using sub-agent that finished first would close the session out from under any concurrently running agent. The server now owns the daemon lifecycle; agents can't close it. - Update execution-agent prompt accordingly: no longer instructs the agent to call browser_close at end-of-task. - Replace the misleading "wait briefly to surface launch errors" comment in /browser/login with an accurate description of the fire-and-forget behavior. - Move the runtime-config import in server/browser/config.ts to the top of the file alongside the other imports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-04-29T04:53:01Z

+  const hint =
+    r.stderr.includes("ENOENT") || r.exitCode === null
+      ? "\n\nIs agent-browser installed? Run `npx agent-browser install` once on this machine."
+      : "";


Misleading install hint fires on timeout, not just missing binary

exitCode === null is true in two distinct cases: (a) agent-browser is not found (ENOENT), and (b) execa kills the process because the 30 s TIMEOUT_MS expired. Case (b) is a routine occurrence for browser_wait whenever the agent passes a duration ≥ 30 000 ms (e.g. "45000" for a slow load), or for any browser command on a sluggish page. In that scenario the agent sees:

[browser error] exit=null Command timed out after 30000ms Is agent-browser installed? Run `npx agent-browser install` once on this machine.

The hint is flat-out wrong; agent-browser is installed. The agent will either waste a turn trying to install it or give up on the task with a confusing diagnosis. Reserve the install hint for the ENOENT case only:

Suggested change

const hint =

r.stderr.includes("ENOENT") || r.exitCode === null

? "\n\nIs agent-browser installed? Run `npx agent-browser install` once on this machine."

: "";

const hint =

r.stderr.includes("ENOENT")

? "\n\nIs agent-browser installed? Run `npx agent-browser install` once on this machine."

: "";

The underlying cause is also worth noting: TIMEOUT_MS = 30_000 is shared by browser_wait, but browser_wait accepts arbitrary millisecond durations with no stated upper bound. Any target ≥ 30 000 will always timeout silently. Consider either documenting a <30 s cap in the tool description, or giving browser_wait a higher per-call timeout (e.g., 90 s).

Suggested change

const hint =

r.stderr.includes("ENOENT") || r.exitCode === null

? "\n\nIs agent-browser installed? Run `npx agent-browser install` once on this machine."

: "";

const hint =

r.stderr.includes("ENOENT")

? "\n\nIs agent-browser installed? Run `npx agent-browser install` once on this machine."

: "";

Adds a pause_for_user MCP tool exposed to execution agents. When a sub-agent hits something that needs the human (login wall, OAuth/2FA, captcha, file pick), it sends a friendly message to the user, persists a continuation, and ends its turn cleanly. The dispatcher stays silent for that turn since the message already went out. On the user's next message, the dispatcher reads the pending continuation and decides: - Reply looks like readiness ("done", "logged in", "ok", or just pushing forward like "what's the balance?") → spawn a fresh sub-agent with the saved resume_task. Browser session/cookies persist across the spawn so the resumed agent picks up where the first left off. - Reply cancels or changes topic → call clear_pending_continuation and proceed normally. New schema: pendingContinuations table (one row per conversation, keyed by conversationId). New convex/pendingContinuations.ts {get,set,clear} mutations. Execution-agent prompt updated to describe the pause flow and the specific trigger conditions (login wall, OAuth, captcha, 2FA, etc.). Dispatcher prompt gets a {{PENDING_CONTINUATION}} block telling it how to handle the next user turn when a continuation is live. Schema enum executionAgents.status gains "paused" so the agents table distinguishes paused agents from completed/failed/cancelled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…atcher anaphora fix - Stealth launcher spawns Chrome on a fixed CDP port and patches navigator.webdriver, languages, plugins, chrome.runtime, and Function.prototype.toString via Page.addScriptToEvaluateOnNewDocument before any site script runs. agent-browser attaches via --cdp instead of launching its own Chrome. - Cookie import scans the user's daily Chrome profile (Default, Profile 1, ...) for Google/LinkedIn/X/Reddit/GitHub session cookies, snapshot-reads them safely while Chrome holds the file, copies into boop's profile, and verifies via a CDP probe that distinguishes logged_in / needs_challenge / not_logged_in. - Convex cookieImports table tracks per-service imports with identity + verification state so the debug UI can show "Active as user@example.com · 4m ago". - New debug UI section under Settings → Browser surfaces logged-in sessions per daily profile with one-click Import/Refresh. - Dispatcher: bumped recent-history limit 10 → 30 and added a "Resolving references" block so anaphoric requests like "forward her the flight details" resolve from earlier conversation context (or ask) instead of defaulting to "most recent X". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

greptile-apps Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread server/execution-agent.ts Outdated

Comment thread server/browser-routes.ts Outdated

Comment thread server/browser/config.ts Outdated

greptile-apps Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread server/interaction-agent.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(browser): full-Chrome fallback integration + parallel sub-agent spawns#31

feat(browser): full-Chrome fallback integration + parallel sub-agent spawns#31
raroque wants to merge 4 commits into
mainfrom
boop-browser-integration

raroque commented Apr 29, 2026

Uh oh!

greptile-apps Bot commented Apr 29, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raroque commented Apr 29, 2026

Summary

Notable design choices

Tradeoffs / known limits

Test plan

Uh oh!

greptile-apps Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Apr 29, 2026 •

edited

Loading