Skip to content

feat: UI testing infrastructure — agent-browser default, CDP fallback#444

Open
ittaiz wants to merge 1 commit into
mainfrom
dev3/task-951bd666
Open

feat: UI testing infrastructure — agent-browser default, CDP fallback#444
ittaiz wants to merge 1 commit into
mainfrom
dev3/task-951bd666

Conversation

@ittaiz

@ittaiz ittaiz commented Apr 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Enable agents to verify UI changes via agent-browser (Playwright) connecting to the remote access server on a fixed port — no CEF bundling needed
  • Remote access server listens on DEV3_RPC_PORT (from port pool) with localhost auth bypass for automation
  • Native/CDP mode available as opt-in via .dev3_cdp sentinel file for terminal/PTY and WebKit-specific testing
  • window.__dev3 automation bridge (navigate + getState) works in both modes
  • Rewrote dev3-ui-control skill: agent-browser as default, agent-electrobun extracted to native-cdp.md

Test plan

  • dev3 dev-server start → check dev3 dev-server status → note DEV3_PORT0
  • Open http://localhost:PORT in a browser → app loads with working RPC (browser mode)
  • touch .dev3_cdp && dev3 dev-server restart → build bundles CEF (native mode)
  • rm .dev3_cdp && dev3 dev-server restart → returns to browser mode
  • Multiple concurrent task dev servers each get unique ports, no conflicts

🤖 Generated with Claude Code

@ittaiz ittaiz force-pushed the dev3/task-951bd666 branch 4 times, most recently from 764f3e2 to ed34645 Compare April 11, 2026 19:37
Enable agents to visually verify dev-3.0 UI changes by connecting
automation tools to the running app. Two modes are supported: browser
mode (default, fast) and native/CDP mode (opt-in, for terminal and
WebKit-specific testing).

Browser mode (agent-browser via remote access server):
The remote access server already served the full React UI to browsers
for remote access. Two changes make it usable for automated testing:
- Listen on DEV3_RPC_PORT (fixed port from the port pool) instead of
  a random port, so agent-browser can target the right task instance
- Bypass JWT auth for localhost connections when DEV3_RPC_PORT is set

Each concurrent task gets its own port, so multiple agents can test
different instances simultaneously without conflicts.

Native/CDP mode (agent-electrobun via CEF):
For terminal/PTY features, native window behavior, or WebKit rendering
bugs, a .dev3_cdp sentinel file switches the devScript to bundle CEF
with Chrome remote debugging enabled. This adds build time but gives
access to the actual Electrobun webview via CDP.

The mode toggle lives entirely in the project devScript — no changes
to the generic dev3 CLI or RPC handlers. Env vars can't cross the Unix
socket boundary between the agent shell and the tmux dev session, so
the devScript reads a file instead.

Automation bridge (window.__dev3):
Expose navigate() and getState() on window.__dev3 for programmatic
navigation and state inspection. Guarded by a Vite define flag
(__DEV3_AUTOMATION) — available in dev/staging, tree-shaken in prod.
Works identically in both agent-browser and agent-electrobun.

Skill docs (.claude/skills/dev3-ui-control/):
- SKILL.md: agent-browser as primary tool, decision matrix, __dev3
  bridge recipes, Playwright-native interaction patterns
- native-cdp.md: agent-electrobun setup, sentinel file toggle, React
  input workaround, CDP connection retry, keyboard event dispatch
- click-navigation.md: click-based navigation reference and gotchas

devScript now detects a stale CEF app bundle (build.json has
defaultRenderer:cef) when .dev3_cdp is absent and removes it before
building, preventing the CEF process from grabbing DEV3_PORT0 ahead
of the remote access server.
@ittaiz ittaiz force-pushed the dev3/task-951bd666 branch from ed34645 to 378cd1c Compare April 11, 2026 19:38

@h0x91b h0x91b left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this is Claude (AI assistant working on this branch).

The idea and the refactor toward agent-browser are great — removing the CEF bundle from dev builds is a solid win. A few things need to be addressed before merging; details in the inline comments.

Must-fix

  1. Security (critical): isLocalhostDevBypass checks the Host header, not the actual TCP address. The server binds to 0.0.0.0, so any LAN host can forge Host: localhost and bypass auth.
  2. Bug: The useEffect for window.__dev3 depends on state — it re-runs on every dispatch, and between cleanup and setup the bridge is undefined.
  3. Tests reimplement production logic — they don't catch regressions. The function should be exported and tested directly.
  4. devScript in .dev3/config.json — nested if/else/fi in a single line. Extract to a separate .sh file.

Happy to re-review once these are addressed.

Comment on lines +34 to +39
/** Allow unauthenticated localhost connections when DEV3_RPC_PORT is set (dev mode). */
function isLocalhostDevBypass(req: Request): boolean {
if (!process.env.DEV3_RPC_PORT) return false;
const host = new URL(req.url).hostname;
return host === "localhost" || host === "127.0.0.1" || host === "::1";
}

@h0x91b h0x91b Apr 15, 2026

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Security — Host header spoofing bypasses auth.

new URL(req.url).hostname returns whatever the client sent in the HTTP Host header, not the actual TCP remote address. The server binds to 0.0.0.0 (line 218), the port is known ($DEV3_PORT0, range 10000–20000). Any host on the LAN can hit http://<lan-ip>:<DEV3_RPC_PORT>/rpc with Host: localhost — and pass through without a token, gaining full RPC access (read projects/files, spawn processes).

Decision doc 033 says "Acceptable for local development" — but that only holds if the bypass is actually restricted to the loopback interface.

Fix options:

  1. Check the real socket address via server.requestIP(req):
function isLocalhostDevBypass(req: Request, server: Server): boolean {
    if (!process.env.DEV3_RPC_PORT) return false;
    const ip = server.requestIP(req);
    return ip?.address === "127.0.0.1" || ip?.address === "::1";
}

Requires threading server into isSessionAuthenticated / isLocalhostDevBypass.

  1. Or bind the server to 127.0.0.1 when DEV3_RPC_PORT is set — the problem disappears entirely, but LAN access from other devices is lost (for the agent use-case, localhost is sufficient).

Comment on lines +220 to +250
// Reimplements isLocalhostDevBypass logic from remote-access-server.ts
function isLocalhostDevBypass(hostname: string, hasRpcPort: boolean): boolean {
if (!hasRpcPort) return false;
return hostname === "localhost" || hostname === "127.0.0.1" || hostname === "::1";
}

describe("localhost dev auth bypass", () => {
it("bypasses auth for localhost when DEV3_RPC_PORT is set", () => {
expect(isLocalhostDevBypass("localhost", true)).toBe(true);
});

it("bypasses auth for 127.0.0.1 when DEV3_RPC_PORT is set", () => {
expect(isLocalhostDevBypass("127.0.0.1", true)).toBe(true);
});

it("bypasses auth for ::1 when DEV3_RPC_PORT is set", () => {
expect(isLocalhostDevBypass("::1", true)).toBe(true);
});

it("does NOT bypass auth for localhost when DEV3_RPC_PORT is unset", () => {
expect(isLocalhostDevBypass("localhost", false)).toBe(false);
});

it("does NOT bypass auth for remote hosts even when DEV3_RPC_PORT is set", () => {
expect(isLocalhostDevBypass("192.168.1.100", true)).toBe(false);
});

it("does NOT bypass auth for remote hosts when DEV3_RPC_PORT is unset", () => {
expect(isLocalhostDevBypass("10.0.0.5", false)).toBe(false);
});
});

@h0x91b h0x91b Apr 15, 2026

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Tests reimplement production logic — they don't catch regressions.

The comment honestly says "Reimplements isLocalhostDevBypass logic from remote-access-server.ts" and copies the function with a different signature (hostname, hasRpcPort). The tests verify their own copy, not the real code — if someone breaks isLocalhostDevBypass in the production module tomorrow, these tests stay green.

Especially important given the security issue flagged in remote-access-server.ts — the tests pass and give a false sense of safety.

Fix: export isLocalhostDevBypass from remote-access-server.ts (like serveStatic is already exported) and call the real function. Same applies to resolveSafePath and MIME_TYPES blocks above — reimplementation instead of import.

Comment thread src/mainview/App.tsx
Comment on lines +388 to +399
useEffect(() => {
if (!(globalThis as any).__DEV3_AUTOMATION) return;
// eslint-disable-next-line @typescript-eslint/no-explicit-any
(window as any).__dev3 = {
navigate: (route: Route) => navigate(route),
getState: () => state,
};
return () => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
delete (window as any).__dev3;
};
}, [navigate, state]);

@h0x91b h0x91b Apr 15, 2026

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Bug: useEffect recreates window.__dev3 on every state change.

state changes on every reducer dispatch (dozens/hundreds of times per session). Each time:

  1. Cleanup — delete window.__dev3
  2. Setup — window.__dev3 = { ... }

Between (1) and (2) there is a micro-window where an agent's eval() will get undefined and crash. Decision 032 says "cheap (object creation)" — but the race between cleanup and setup is not mentioned.

Fix via ref:

const stateRef = useRef(state);
stateRef.current = state;

useEffect(() => {
    if (!(globalThis as any).__DEV3_AUTOMATION) return;
    (window as any).__dev3 = {
        navigate,
        getState: () => stateRef.current,
    };
    return () => { delete (window as any).__dev3; };
}, [navigate]);

Also navigate: (route: Route) => navigate(route) can be simplified to just navigate — the wrapper is unnecessary.

Comment thread .dev3/config.json
{
"setupScript": "echo setup\nbun install",
"devScript": "bun run dev",
"devScript": "if [ -f .dev3_cdp ]; then DEV3_CDP_PORT=${DEV3_PORT0} bun run dev; else if [ -f build/dev-macos-arm64/dev-3.0-dev.app/Contents/Resources/build.json ] && grep -q '\"defaultRenderer\":\"cef\"' build/dev-macos-arm64/dev-3.0-dev.app/Contents/Resources/build.json; then echo '[dev3] Stale CEF build detected without .dev3_cdp — removing app bundle' && rm -rf build/dev-macos-arm64/dev-3.0-dev.app; fi && DEV3_RPC_PORT=${DEV3_PORT0:-19191} bun run dev; fi",

@h0x91b h0x91b Apr 15, 2026

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Extract into a separate .sh file.

Nested if/then/if/then/.../fi && .../fi in a single JSON string is very hard to read and dangerous to edit — one missing semicolon and the dev server won't start.

Suggestion: create scripts/dev-server.sh:

#!/usr/bin/env bash
set -euo pipefail

if [ -f .dev3_cdp ]; then
    DEV3_CDP_PORT="${DEV3_PORT0}" bun run dev
else
    BUILD_JSON="build/dev-macos-arm64/dev-3.0-dev.app/Contents/Resources/build.json"
    if [ -f "$BUILD_JSON" ] && grep -q '"defaultRenderer":"cef"' "$BUILD_JSON"; then
        echo '[dev3] Stale CEF build detected without .dev3_cdp — removing app bundle'
        rm -rf build/dev-macos-arm64/dev-3.0-dev.app
    fi
    DEV3_RPC_PORT="${DEV3_PORT0:-19191}" bun run dev
fi

Then in config.json simply: "devScript": "bash scripts/dev-server.sh". Decision 033 says "mode switching belongs in the devScript" — agreed, but the devScript doesn't have to be a one-liner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants