feat: UI testing infrastructure — agent-browser default, CDP fallback#444
feat: UI testing infrastructure — agent-browser default, CDP fallback#444ittaiz wants to merge 1 commit into
Conversation
764f3e2 to
ed34645
Compare
Enable agents to visually verify dev-3.0 UI changes by connecting automation tools to the running app. Two modes are supported: browser mode (default, fast) and native/CDP mode (opt-in, for terminal and WebKit-specific testing). Browser mode (agent-browser via remote access server): The remote access server already served the full React UI to browsers for remote access. Two changes make it usable for automated testing: - Listen on DEV3_RPC_PORT (fixed port from the port pool) instead of a random port, so agent-browser can target the right task instance - Bypass JWT auth for localhost connections when DEV3_RPC_PORT is set Each concurrent task gets its own port, so multiple agents can test different instances simultaneously without conflicts. Native/CDP mode (agent-electrobun via CEF): For terminal/PTY features, native window behavior, or WebKit rendering bugs, a .dev3_cdp sentinel file switches the devScript to bundle CEF with Chrome remote debugging enabled. This adds build time but gives access to the actual Electrobun webview via CDP. The mode toggle lives entirely in the project devScript — no changes to the generic dev3 CLI or RPC handlers. Env vars can't cross the Unix socket boundary between the agent shell and the tmux dev session, so the devScript reads a file instead. Automation bridge (window.__dev3): Expose navigate() and getState() on window.__dev3 for programmatic navigation and state inspection. Guarded by a Vite define flag (__DEV3_AUTOMATION) — available in dev/staging, tree-shaken in prod. Works identically in both agent-browser and agent-electrobun. Skill docs (.claude/skills/dev3-ui-control/): - SKILL.md: agent-browser as primary tool, decision matrix, __dev3 bridge recipes, Playwright-native interaction patterns - native-cdp.md: agent-electrobun setup, sentinel file toggle, React input workaround, CDP connection retry, keyboard event dispatch - click-navigation.md: click-based navigation reference and gotchas devScript now detects a stale CEF app bundle (build.json has defaultRenderer:cef) when .dev3_cdp is absent and removes it before building, preventing the CEF process from grabbing DEV3_PORT0 ahead of the remote access server.
ed34645 to
378cd1c
Compare
There was a problem hiding this comment.
Hi, this is Claude (AI assistant working on this branch).
The idea and the refactor toward agent-browser are great — removing the CEF bundle from dev builds is a solid win. A few things need to be addressed before merging; details in the inline comments.
Must-fix
- Security (critical):
isLocalhostDevBypasschecks theHostheader, not the actual TCP address. The server binds to0.0.0.0, so any LAN host can forgeHost: localhostand bypass auth. - Bug: The
useEffectforwindow.__dev3depends onstate— it re-runs on every dispatch, and between cleanup and setup the bridge isundefined. - Tests reimplement production logic — they don't catch regressions. The function should be exported and tested directly.
- devScript in
.dev3/config.json— nestedif/else/fiin a single line. Extract to a separate.shfile.
Happy to re-review once these are addressed.
| /** Allow unauthenticated localhost connections when DEV3_RPC_PORT is set (dev mode). */ | ||
| function isLocalhostDevBypass(req: Request): boolean { | ||
| if (!process.env.DEV3_RPC_PORT) return false; | ||
| const host = new URL(req.url).hostname; | ||
| return host === "localhost" || host === "127.0.0.1" || host === "::1"; | ||
| } |
There was a problem hiding this comment.
🔴 Security — Host header spoofing bypasses auth.
new URL(req.url).hostname returns whatever the client sent in the HTTP Host header, not the actual TCP remote address. The server binds to 0.0.0.0 (line 218), the port is known ($DEV3_PORT0, range 10000–20000). Any host on the LAN can hit http://<lan-ip>:<DEV3_RPC_PORT>/rpc with Host: localhost — and pass through without a token, gaining full RPC access (read projects/files, spawn processes).
Decision doc 033 says "Acceptable for local development" — but that only holds if the bypass is actually restricted to the loopback interface.
Fix options:
- Check the real socket address via
server.requestIP(req):
function isLocalhostDevBypass(req: Request, server: Server): boolean {
if (!process.env.DEV3_RPC_PORT) return false;
const ip = server.requestIP(req);
return ip?.address === "127.0.0.1" || ip?.address === "::1";
}Requires threading server into isSessionAuthenticated / isLocalhostDevBypass.
- Or bind the server to
127.0.0.1whenDEV3_RPC_PORTis set — the problem disappears entirely, but LAN access from other devices is lost (for the agent use-case, localhost is sufficient).
| // Reimplements isLocalhostDevBypass logic from remote-access-server.ts | ||
| function isLocalhostDevBypass(hostname: string, hasRpcPort: boolean): boolean { | ||
| if (!hasRpcPort) return false; | ||
| return hostname === "localhost" || hostname === "127.0.0.1" || hostname === "::1"; | ||
| } | ||
|
|
||
| describe("localhost dev auth bypass", () => { | ||
| it("bypasses auth for localhost when DEV3_RPC_PORT is set", () => { | ||
| expect(isLocalhostDevBypass("localhost", true)).toBe(true); | ||
| }); | ||
|
|
||
| it("bypasses auth for 127.0.0.1 when DEV3_RPC_PORT is set", () => { | ||
| expect(isLocalhostDevBypass("127.0.0.1", true)).toBe(true); | ||
| }); | ||
|
|
||
| it("bypasses auth for ::1 when DEV3_RPC_PORT is set", () => { | ||
| expect(isLocalhostDevBypass("::1", true)).toBe(true); | ||
| }); | ||
|
|
||
| it("does NOT bypass auth for localhost when DEV3_RPC_PORT is unset", () => { | ||
| expect(isLocalhostDevBypass("localhost", false)).toBe(false); | ||
| }); | ||
|
|
||
| it("does NOT bypass auth for remote hosts even when DEV3_RPC_PORT is set", () => { | ||
| expect(isLocalhostDevBypass("192.168.1.100", true)).toBe(false); | ||
| }); | ||
|
|
||
| it("does NOT bypass auth for remote hosts when DEV3_RPC_PORT is unset", () => { | ||
| expect(isLocalhostDevBypass("10.0.0.5", false)).toBe(false); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
🔴 Tests reimplement production logic — they don't catch regressions.
The comment honestly says "Reimplements isLocalhostDevBypass logic from remote-access-server.ts" and copies the function with a different signature (hostname, hasRpcPort). The tests verify their own copy, not the real code — if someone breaks isLocalhostDevBypass in the production module tomorrow, these tests stay green.
Especially important given the security issue flagged in remote-access-server.ts — the tests pass and give a false sense of safety.
Fix: export isLocalhostDevBypass from remote-access-server.ts (like serveStatic is already exported) and call the real function. Same applies to resolveSafePath and MIME_TYPES blocks above — reimplementation instead of import.
| useEffect(() => { | ||
| if (!(globalThis as any).__DEV3_AUTOMATION) return; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| (window as any).__dev3 = { | ||
| navigate: (route: Route) => navigate(route), | ||
| getState: () => state, | ||
| }; | ||
| return () => { | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| delete (window as any).__dev3; | ||
| }; | ||
| }, [navigate, state]); |
There was a problem hiding this comment.
🟠 Bug: useEffect recreates window.__dev3 on every state change.
state changes on every reducer dispatch (dozens/hundreds of times per session). Each time:
- Cleanup —
delete window.__dev3 - Setup —
window.__dev3 = { ... }
Between (1) and (2) there is a micro-window where an agent's eval() will get undefined and crash. Decision 032 says "cheap (object creation)" — but the race between cleanup and setup is not mentioned.
Fix via ref:
const stateRef = useRef(state);
stateRef.current = state;
useEffect(() => {
if (!(globalThis as any).__DEV3_AUTOMATION) return;
(window as any).__dev3 = {
navigate,
getState: () => stateRef.current,
};
return () => { delete (window as any).__dev3; };
}, [navigate]);Also navigate: (route: Route) => navigate(route) can be simplified to just navigate — the wrapper is unnecessary.
| { | ||
| "setupScript": "echo setup\nbun install", | ||
| "devScript": "bun run dev", | ||
| "devScript": "if [ -f .dev3_cdp ]; then DEV3_CDP_PORT=${DEV3_PORT0} bun run dev; else if [ -f build/dev-macos-arm64/dev-3.0-dev.app/Contents/Resources/build.json ] && grep -q '\"defaultRenderer\":\"cef\"' build/dev-macos-arm64/dev-3.0-dev.app/Contents/Resources/build.json; then echo '[dev3] Stale CEF build detected without .dev3_cdp — removing app bundle' && rm -rf build/dev-macos-arm64/dev-3.0-dev.app; fi && DEV3_RPC_PORT=${DEV3_PORT0:-19191} bun run dev; fi", |
There was a problem hiding this comment.
🟡 Extract into a separate .sh file.
Nested if/then/if/then/.../fi && .../fi in a single JSON string is very hard to read and dangerous to edit — one missing semicolon and the dev server won't start.
Suggestion: create scripts/dev-server.sh:
#!/usr/bin/env bash
set -euo pipefail
if [ -f .dev3_cdp ]; then
DEV3_CDP_PORT="${DEV3_PORT0}" bun run dev
else
BUILD_JSON="build/dev-macos-arm64/dev-3.0-dev.app/Contents/Resources/build.json"
if [ -f "$BUILD_JSON" ] && grep -q '"defaultRenderer":"cef"' "$BUILD_JSON"; then
echo '[dev3] Stale CEF build detected without .dev3_cdp — removing app bundle'
rm -rf build/dev-macos-arm64/dev-3.0-dev.app
fi
DEV3_RPC_PORT="${DEV3_PORT0:-19191}" bun run dev
fiThen in config.json simply: "devScript": "bash scripts/dev-server.sh". Decision 033 says "mode switching belongs in the devScript" — agreed, but the devScript doesn't have to be a one-liner.
Summary
DEV3_RPC_PORT(from port pool) with localhost auth bypass for automation.dev3_cdpsentinel file for terminal/PTY and WebKit-specific testingwindow.__dev3automation bridge (navigate + getState) works in both modesTest plan
dev3 dev-server start→ checkdev3 dev-server status→ noteDEV3_PORT0http://localhost:PORTin a browser → app loads with working RPC (browser mode)touch .dev3_cdp && dev3 dev-server restart→ build bundles CEF (native mode)rm .dev3_cdp && dev3 dev-server restart→ returns to browser mode🤖 Generated with Claude Code