Skip to content

Commit dc0bae8

Browse files
garrytanclaude
andauthored
fix: sidebar agent uses real tab URL instead of stale Playwright URL (v0.12.6.0) (garrytan#544)
* fix: sidebar agent uses extension's activeTabUrl instead of stale Playwright URL When the user navigates manually in headed Chrome, Playwright's page.url() stays on the old page. The sidebar agent was using this stale URL in its system prompt, causing it to navigate to the wrong page (e.g., Hacker News instead of the user's current page). The Chrome extension now captures the active tab URL via chrome.tabs.query() and sends it as activeTabUrl in the /sidebar-command POST body. The server prefers this over Playwright's URL. The URL is sanitized (http/https only, control chars stripped, 2048 char limit) to prevent prompt injection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: connect-chrome pre-flight cleanup + improved onboarding docs Adds Step 0 pre-flight cleanup that kills stale browse servers and cleans Chromium profile locks before connecting. Improves the onboarding flow with clearer instructions for finding the extension, opening the Side Panel, and troubleshooting connection issues. Fixes Mode check from cdp to headed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: sidebar agent test suite (layers 1-2) Layer 1 (unit): 18 tests for URL sanitization in sidebar-utils.ts — http/https pass, chrome:// rejected, javascript: rejected, control chars stripped, truncation. Layer 2 (integration): 13 tests for server HTTP endpoints — auth, sidebar-command queue writes, activeTabUrl override/fallback, event relay to chat buffer, message queuing, queue overflow (429), chat clear, agent kill. Source changes for testability: - Extract sanitizeExtensionUrl() to browse/src/sidebar-utils.ts - Add BROWSE_HEADLESS_SKIP env var to skip browser launch in HTTP-only tests - Add SIDEBAR_QUEUE_PATH env var to both server.ts and sidebar-agent.ts - Add SIDEBAR_AGENT_TIMEOUT env var to sidebar-agent.ts - Sync package.json version to match VERSION (0.12.2.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: sidebar agent round-trip tests with mock claude (layer 3) Starts server + sidebar-agent together with a mock claude binary (shell script outputting canned stream-json). Verifies the full queue-based message flow: - Full round-trip: POST /sidebar-command → queue → agent → mock claude → events → chat - Claude crash recovery: mock exits 1, agent_error appears, status returns to idle - Sequential queue drain: two rapid messages both process in order Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: sidebar agent E2E tests with real Claude (layer 4) Two E2E tests that exercise the full sidebar agent flow with real Claude: - sidebar-navigate: POST /sidebar-command asking Claude to describe a fixture page, verify it responds with page content through the chat buffer - sidebar-url-accuracy: POST with activeTabUrl differing from Playwright URL, verify the queue prompt uses the extension URL (the core bug fix) Both registered as periodic tier (~$0.80 total, non-deterministic). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sidebar E2E tests — sequential execution + eval collector fix Both tests now pass: - sidebar-url-accuracy: deterministic queue file check (no Claude needed) - sidebar-navigate: real Claude responds through sidebar agent queue Fixed: testIfSelected (sequential, not concurrent) to avoid queue file conflicts. Added cost_usd field for eval collector compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: kill stale sidebar-agent processes before starting new one Each /connect-chrome starts a new sidebar-agent subprocess with unref() but never kills the previous one. Old agents accumulate as zombies with stale auth tokens. When they pick up queue entries, their event relay fails (401), so the server never receives agent_done and marks the agent as "hung". The user sees the sidebar freeze. Fix: pkill any existing sidebar-agent.ts processes before spawning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.12.6.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add P1 TODO for sidebar Write tool + error visibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 3d52382 commit dc0bae8

16 files changed

Lines changed: 1408 additions & 167 deletions

File tree

.agents/skills/gstack-connect-chrome/SKILL.md

Lines changed: 110 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -342,72 +342,126 @@ If `NEEDS_SETUP`:
342342
2. Run: `cd <SKILL_DIR> && ./setup`
343343
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
344344

345+
## Step 0: Pre-flight cleanup
346+
347+
Before connecting, kill any stale browse servers and clean up lock files that
348+
may have persisted from a crash. This prevents "already connected" false
349+
positives and Chromium profile lock conflicts.
350+
351+
```bash
352+
# Kill any existing browse server
353+
if [ -f "$(git rev-parse --show-toplevel 2>/dev/null)/.gstack/browse.json" ]; then
354+
_OLD_PID=$(cat "$(git rev-parse --show-toplevel)/.gstack/browse.json" 2>/dev/null | grep -o '"pid":[0-9]*' | grep -o '[0-9]*')
355+
[ -n "$_OLD_PID" ] && kill "$_OLD_PID" 2>/dev/null || true
356+
sleep 1
357+
[ -n "$_OLD_PID" ] && kill -9 "$_OLD_PID" 2>/dev/null || true
358+
rm -f "$(git rev-parse --show-toplevel)/.gstack/browse.json"
359+
fi
360+
# Clean Chromium profile locks (can persist after crashes)
361+
_PROFILE_DIR="$HOME/.gstack/chromium-profile"
362+
for _LF in SingletonLock SingletonSocket SingletonCookie; do
363+
rm -f "$_PROFILE_DIR/$_LF" 2>/dev/null || true
364+
done
365+
echo "Pre-flight cleanup done"
366+
```
367+
345368
## Step 1: Connect
346369

347370
```bash
348371
$B connect
349372
```
350373

351-
This launches your system Chrome via Playwright with:
352-
- A visible window (headed mode, not headless)
353-
- The gstack Chrome extension pre-loaded
354-
- A green shimmer line + "gstack" pill so you know which window is controlled
374+
This launches Playwright's bundled Chromium in headed mode with:
375+
- A visible window you can watch (not your regular Chrome — it stays untouched)
376+
- The gstack Chrome extension auto-loaded via `launchPersistentContext`
377+
- A golden shimmer line at the top of every page so you know which window is controlled
378+
- A sidebar agent process for chat commands
355379

356-
If Chrome is already running, the server restarts in headed mode with a fresh
357-
Chrome instance. Your regular Chrome stays untouched.
380+
The `connect` command auto-discovers the extension from the gstack install
381+
directory. It always uses port **34567** so the extension can auto-connect.
358382

359-
After connecting, print the output to the user.
383+
After connecting, print the full output to the user. Confirm you see
384+
`Mode: headed` in the output.
385+
386+
If the output shows an error or the mode is not `headed`, run `$B status` and
387+
share the output with the user before proceeding.
360388

361389
## Step 2: Verify
362390

363391
```bash
364392
$B status
365393
```
366394

367-
Confirm the output shows `Mode: cdp`. Print the port number — the user may need
368-
it for the Side Panel.
395+
Confirm the output shows `Mode: headed`. Read the port from the state file:
396+
397+
```bash
398+
cat "$(git rev-parse --show-toplevel 2>/dev/null)/.gstack/browse.json" 2>/dev/null | grep -o '"port":[0-9]*' | grep -o '[0-9]*'
399+
```
400+
401+
The port should be **34567**. If it's different, note it — the user may need it
402+
for the Side Panel.
403+
404+
Also find the extension path so you can help the user if they need to load it manually:
405+
406+
```bash
407+
_EXT_PATH=""
408+
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
409+
[ -n "$_ROOT" ] && [ -f "$_ROOT/.agents/skills/gstack/extension/manifest.json" ] && _EXT_PATH="$_ROOT/.agents/skills/gstack/extension"
410+
[ -z "$_EXT_PATH" ] && [ -f "$HOME/.agents/skills/gstack/extension/manifest.json" ] && _EXT_PATH="$HOME/.agents/skills/gstack/extension"
411+
echo "EXTENSION_PATH: ${_EXT_PATH:-NOT FOUND}"
412+
```
369413

370414
## Step 3: Guide the user to the Side Panel
371415

372416
Use AskUserQuestion:
373417

374-
> Chrome is launched with gstack control. You should see a green shimmer line at the
375-
> top of the Chrome window and a small "gstack" pill in the bottom-right corner.
376-
>
377-
> The Side Panel extension is pre-loaded. To open it:
378-
> 1. Look for the **puzzle piece icon** (Extensions) in Chrome's toolbar
379-
> 2. Click it → find **gstack browse** → click the **pin icon** to pin it
380-
> 3. Click the **gstack icon** in the toolbar
381-
> 4. Click **Open Side Panel**
418+
> Chrome is launched with gstack control. You should see Playwright's Chromium
419+
> (not your regular Chrome) with a golden shimmer line at the top of the page.
382420
>
383-
> The Side Panel shows a live feed of every browse command in real time.
421+
> The Side Panel extension should be auto-loaded. To open it:
422+
> 1. Look for the **puzzle piece icon** (Extensions) in the toolbar — it may
423+
> already show the gstack icon if the extension loaded successfully
424+
> 2. Click the **puzzle piece** → find **gstack browse** → click the **pin icon**
425+
> 3. Click the pinned **gstack icon** in the toolbar
426+
> 4. The Side Panel should open on the right showing a live activity feed
384427
>
385-
> **Port:** The browse server is on port {PORT} — the extension auto-detects it
386-
> if you're using the Playwright-controlled Chrome. If the badge stays gray, click
387-
> the gstack icon and enter port {PORT} manually.
428+
> **Port:** 34567 (auto-detected — the extension connects automatically in the
429+
> Playwright-controlled Chrome).
388430
389431
Options:
390432
- A) I can see the Side Panel — let's go!
391433
- B) I can see Chrome but can't find the extension
392434
- C) Something went wrong
393435

394436
If B: Tell the user:
395-
> The extension should be auto-loaded, but Chrome sometimes doesn't show it
396-
> immediately. Try:
437+
438+
> The extension is loaded into Playwright's Chromium at launch time, but
439+
> sometimes it doesn't appear immediately. Try these steps:
440+
>
397441
> 1. Type `chrome://extensions` in the address bar
398-
> 2. Look for "gstack browse" — it should be listed and enabled
399-
> 3. If not listed, click "Load unpacked" → navigate to the extension folder
400-
> (press Cmd+Shift+G in the file picker, paste this path):
401-
> `{EXTENSION_PATH}`
442+
> 2. Look for **"gstack browse"** — it should be listed and enabled
443+
> 3. If it's there but not pinned, go back to any page, click the puzzle piece
444+
> icon, and pin it
445+
> 4. If it's NOT listed at all, click **"Load unpacked"** and navigate to:
446+
> - Press **Cmd+Shift+G** in the file picker dialog
447+
> - Paste this path: `{EXTENSION_PATH}` (use the path from Step 2)
448+
> - Click **Select**
449+
>
450+
> After loading, pin it and click the icon to open the Side Panel.
402451
>
403-
> Then pin it from the puzzle piece icon and open the Side Panel.
452+
> If the Side Panel badge stays gray (disconnected), click the gstack icon
453+
> and enter port **34567** manually.
454+
455+
If C:
404456

405-
If C: Run `$B status` and show the output. Check if the server is healthy.
457+
1. Run `$B status` and show the output
458+
2. If the server is not healthy, re-run Step 0 cleanup + Step 1 connect
459+
3. If the server IS healthy but the browser isn't visible, try `$B focus`
460+
4. If that fails, ask the user what they see (error message, blank screen, etc.)
406461

407462
## Step 4: Demo
408463

409-
After the user confirms the Side Panel is working, run a quick demo so they
410-
can see the activity feed in action:
464+
After the user confirms the Side Panel is working, run a quick demo:
411465

412466
```bash
413467
$B goto https://news.ycombinator.com
@@ -420,16 +474,17 @@ $B snapshot -i
420474
```
421475

422476
Tell the user: "Check the Side Panel — you should see the `goto` and `snapshot`
423-
commands appear in the activity feed. Every command Claude runs will show up here
477+
commands appear in the activity feed. Every command Claude runs shows up here
424478
in real time."
425479

426480
## Step 5: Sidebar chat
427481

428482
After the activity feed demo, tell the user about the sidebar chat:
429483

430484
> The Side Panel also has a **chat tab**. Try typing a message like "take a
431-
> snapshot and describe this page." A child Claude instance will execute your
432-
> request in the browser — you'll see the commands appear in the activity feed.
485+
> snapshot and describe this page." A sidebar agent (a child Claude instance)
486+
> executes your request in the browser — you'll see the commands appear in
487+
> the activity feed as they happen.
433488
>
434489
> The sidebar agent can navigate pages, click buttons, fill forms, and read
435490
> content. Each task gets up to 5 minutes. It runs in an isolated session, so
@@ -439,17 +494,28 @@ After the activity feed demo, tell the user about the sidebar chat:
439494

440495
Tell the user:
441496

442-
> You're all set! Chrome is under Claude's control with the Side Panel showing
443-
> live activity and a chat sidebar for direct commands. Here's what you can do:
497+
> You're all set! Here's what you can do with the connected Chrome:
498+
>
499+
> **Watch Claude work in real time:**
500+
> - Run any gstack skill (`/qa`, `/design-review`, `/benchmark`) and watch
501+
> every action happen in the visible Chrome window + Side Panel feed
502+
> - No cookie import needed — the Playwright browser shares its own session
503+
>
504+
> **Control the browser directly:**
505+
> - **Sidebar chat** — type natural language in the Side Panel and the sidebar
506+
> agent executes it (e.g., "fill in the login form and submit")
507+
> - **Browse commands**`$B goto <url>`, `$B click <sel>`, `$B fill <sel> <val>`,
508+
> `$B snapshot -i` — all visible in Chrome + Side Panel
509+
>
510+
> **Window management:**
511+
> - `$B focus` — bring Chrome to the foreground anytime
512+
> - `$B disconnect` — close headed Chrome and return to headless mode
444513
>
445-
> - **Chat in the sidebar** — type natural language instructions and Claude
446-
> executes them in the browser
447-
> - **Run any browse command**`$B goto`, `$B click`, `$B snapshot` — and
448-
> watch it happen in Chrome + the Side Panel
449-
> - **Use /qa or /design-review** — they'll run in the visible Chrome window
450-
> instead of headless. No cookie import needed.
451-
> - **`$B focus`** — bring Chrome to the foreground anytime
452-
> - **`$B disconnect`** — return to headless mode when done
514+
> **What skills look like in headed mode:**
515+
> - `/qa` runs its full test suite in the visible browser — you see every page
516+
> load, every click, every assertion
517+
> - `/design-review` takes screenshots in the real browser — same pixels you see
518+
> - `/benchmark` measures performance in the headed browser
453519
454520
Then proceed with whatever the user asked to do. If they didn't specify a task,
455521
ask what they'd like to test or browse.

CHANGELOG.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,20 @@
11
# Changelog
22

3+
## [0.12.6.0] - 2026-03-27 — Sidebar Knows What Page You're On
4+
5+
The Chrome sidebar agent used to navigate to the wrong page when you asked it to do something. If you'd manually browsed to a site, the sidebar would ignore that and go to whatever Playwright last saw (often Hacker News from the demo). Now it works.
6+
7+
### Fixed
8+
9+
- **Sidebar uses the real tab URL.** The Chrome extension now captures the actual page URL via `chrome.tabs.query()` and sends it to the server. Previously the sidebar agent used Playwright's stale `page.url()`, which didn't update when you navigated manually in headed mode.
10+
- **URL sanitization.** The extension-provided URL is validated (http/https only, control characters stripped, 2048 char limit) before being used in the Claude system prompt. Prevents prompt injection via crafted URLs.
11+
- **Stale sidebar agents killed on reconnect.** Each `/connect-chrome` now kills leftover sidebar-agent processes before starting a new one. Old agents had stale auth tokens and would silently fail, causing the sidebar to freeze.
12+
13+
### Added
14+
15+
- **Pre-flight cleanup for `/connect-chrome`.** Kills stale browse servers and cleans Chromium profile locks before connecting. Prevents "already connected" false positives after crashes.
16+
- **Sidebar agent test suite (36 tests).** Four layers: unit tests for URL sanitization, integration tests for server HTTP endpoints, mock-Claude round-trip tests, and E2E tests with real Claude. All free except layer 4.
17+
318
## [0.12.5.1] - 2026-03-27 — Eng Review Now Tells You What to Parallelize
419

520
`/plan-eng-review` automatically analyzes your plan for parallel execution opportunities. When your plan has independent workstreams, the review outputs a dependency table, parallel lanes, and execution order so you know exactly which tasks to split into separate git worktrees.

TODOS.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,18 @@ Sidebar agent writes structured messages to `.context/sidebar-inbox/`. Workspace
185185
**Priority:** P3
186186
**Depends on:** Headed mode (shipped)
187187

188+
### Sidebar agent needs Write tool + better error visibility
189+
190+
**What:** Two issues with the sidebar agent (`sidebar-agent.ts`): (1) `--allowedTools` is hardcoded to `Bash,Read,Glob,Grep`, missing `Write`. Claude can't create files (like CSVs) when asked. (2) When Claude errors or returns empty, the sidebar UI shows nothing, just a green dot. No error message, no "I tried but failed", nothing.
191+
192+
**Why:** Users ask "write this to a CSV" and the sidebar silently can't. Then they think it's broken. The UI needs to surface errors visibly, and Claude needs the tools to actually do what's asked.
193+
194+
**Context:** `sidebar-agent.ts:163` hardcodes `--allowedTools`. The event relay (`handleStreamEvent`) handles `agent_done` and `agent_error` but the extension's sidepanel.js may not be rendering error states. The sidebar should show "Error: ..." or "Claude finished but produced no output" instead of staying on the green dot forever.
195+
196+
**Effort:** S (human: ~2h / CC: ~10min)
197+
**Priority:** P1
198+
**Depends on:** None
199+
188200
### Chrome Web Store publishing
189201

190202
**What:** Publish the gstack browse Chrome extension to Chrome Web Store for easier install.

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.12.5.1
1+
0.12.6.0

browse/src/cli.ts

Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -511,8 +511,27 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
511511
}
512512
}
513513

514-
// Clean up Chromium profile locks (can persist after crashes)
514+
// Kill orphaned Chromium processes that may still hold the profile lock.
515+
// The server PID is the Bun process; Chromium is a child that can outlive it
516+
// if the server is killed abruptly (SIGKILL, crash, manual rm of state file).
515517
const profileDir = path.join(process.env.HOME || '/tmp', '.gstack', 'chromium-profile');
518+
try {
519+
const singletonLock = path.join(profileDir, 'SingletonLock');
520+
const lockTarget = fs.readlinkSync(singletonLock); // e.g. "hostname-12345"
521+
const orphanPid = parseInt(lockTarget.split('-').pop() || '', 10);
522+
if (orphanPid && isProcessAlive(orphanPid)) {
523+
try { process.kill(orphanPid, 'SIGTERM'); } catch {}
524+
await new Promise(resolve => setTimeout(resolve, 1000));
525+
if (isProcessAlive(orphanPid)) {
526+
try { process.kill(orphanPid, 'SIGKILL'); } catch {}
527+
await new Promise(resolve => setTimeout(resolve, 500));
528+
}
529+
}
530+
} catch {
531+
// No lock symlink or not readable — nothing to kill
532+
}
533+
534+
// Clean up Chromium profile locks (can persist after crashes)
516535
for (const lockFile of ['SingletonLock', 'SingletonSocket', 'SingletonCookie']) {
517536
try { fs.unlinkSync(path.join(profileDir, lockFile)); } catch {}
518537
}
@@ -545,17 +564,38 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
545564
console.log(`Connected to real Chrome\n${status}`);
546565

547566
// Auto-start sidebar agent
548-
const agentScript = path.resolve(__dirname, 'sidebar-agent.ts');
567+
// __dirname is inside $bunfs in compiled binaries — resolve from execPath instead
568+
let agentScript = path.resolve(__dirname, 'sidebar-agent.ts');
569+
if (!fs.existsSync(agentScript)) {
570+
agentScript = path.resolve(path.dirname(process.execPath), '..', 'src', 'sidebar-agent.ts');
571+
}
549572
try {
573+
if (!fs.existsSync(agentScript)) {
574+
throw new Error(`sidebar-agent.ts not found at ${agentScript}`);
575+
}
550576
// Clear old agent queue
551577
const agentQueue = path.join(process.env.HOME || '/tmp', '.gstack', 'sidebar-agent-queue.jsonl');
552578
try { fs.writeFileSync(agentQueue, ''); } catch {}
553579

580+
// Resolve browse binary path the same way — execPath-relative
581+
let browseBin = path.resolve(__dirname, '..', 'dist', 'browse');
582+
if (!fs.existsSync(browseBin)) {
583+
browseBin = process.execPath; // the compiled binary itself
584+
}
585+
586+
// Kill any existing sidebar-agent processes before starting a new one.
587+
// Old agents have stale auth tokens and will silently fail to relay events,
588+
// causing the server to mark the agent as "hung".
589+
try {
590+
const { spawnSync } = require('child_process');
591+
spawnSync('pkill', ['-f', 'sidebar-agent\\.ts'], { stdio: 'ignore', timeout: 3000 });
592+
} catch {}
593+
554594
const agentProc = Bun.spawn(['bun', 'run', agentScript], {
555595
cwd: config.projectDir,
556596
env: {
557597
...process.env,
558-
BROWSE_BIN: path.resolve(__dirname, '..', 'dist', 'browse'),
598+
BROWSE_BIN: browseBin,
559599
BROWSE_STATE_FILE: config.stateFile,
560600
BROWSE_SERVER_PORT: String(newState.port),
561601
},

0 commit comments

Comments
 (0)