diff --git a/agent-logs/2026-04-30T004933-0500-recoverable-stale-transport.md b/agent-logs/2026-04-30T004933-0500-recoverable-stale-transport.md new file mode 100644 index 0000000..6908733 --- /dev/null +++ b/agent-logs/2026-04-30T004933-0500-recoverable-stale-transport.md @@ -0,0 +1,58 @@ +# Recoverable Stale Transport Cleanup + +## Goal + +Explain and fix why long-running Browser MCP agents could still see `Transport closed` days after the v0.6.0 upgrade, despite the earlier stale-daemon mismatch being gone. + +## Assumptions Checked + +- Current v0.6.0 is not the old mixed-version daemon issue. +- The relevant failure path is a long-lived initialized MCP transport whose session registry entry was soft-reaped and later removed. +- Cleanup must still reclaim genuinely abandoned pre-initialize daemon ingress sockets. + +## Files Changed + +- `local-mcp-bun/src/mcp_protocol_session.ts` +- `local-mcp-bun/src/daemon_ingress_server.ts` +- `local-mcp-bun/tests/integration/daemon_ingress_cleanup.test.ts` +- `local-mcp-bun/README.md` +- `specs/local-multiplexed-browser-mcp-spec.md` +- `specs/local-multiplexed-browser-mcp-changelog.md` + +## Decisions + +- Added an explicit protocol-session predicate for recoverable initialized state. +- Daemon ingress now skips stale unbound socket closure when that connection still has recoverable initialized state. +- The cleanup policy remains bounded: stale unbound sockets without initialized recoverable state still close on timeout. +- Spec and README now distinguish recoverable initialized MCP transports from pre-initialize proxy leaks. + +## Validation + +- `bun test local-mcp-bun/tests/integration/daemon_ingress_cleanup.test.ts` -> 9 pass. +- `bun test local-mcp-bun/tests/unit/mcp_protocol_session.test.ts` -> 13 pass. +- `bun run --cwd local-mcp-bun lint:spec` -> pass. +- `bun run --cwd local-mcp-bun lint:docs` -> pass. +- `bun run --cwd local-mcp-bun lint:compliance` -> pass. +- `git diff --check` -> pass. + +## Git State + +- Current branch head before finalization: `cdcf22d chore(release): bump version to 0.6.0`. +- Existing unrelated untracked files were left untouched. +- PR branch: `codex/recoverable-stale-transport`. + +## Remaining Risk + +- This is local deterministic coverage of the daemon/protocol lifecycle. A live long-running external harness should no longer need a session restart after its stale resource entry is removed, but that exact harness behavior depends on it keeping the underlying stdio/WebSocket process alive. + +## Version Bump And Linux Build + +- Updated release/runtime version surfaces from `0.6.0` to `0.6.1`: + - `local-mcp-bun/package.json` + - `local-mcp-bun/chrome-extension/manifest.json` + - `local-mcp-bun/src/mcp_protocol_session.ts` + - `local-mcp-bun/src/bridge_transport.ts` +- `bun run --cwd local-mcp-bun release:check-version -- --version v0.6.1` -> pass. +- `bun run --cwd local-mcp-bun package:release -- --mode=server --version v0.6.1 --clean` -> produced `local-mcp-bun/dist/release/v0.6.1/chrome-browser-mcp-v0.6.1-linux-x64`. +- Packaged binary smoke initialize with `BRIDGE_MODE=in_memory MCP_DAEMON_MODE=direct` returned `serverInfo.version="0.6.1"`. +- SHA256: `315650bd6d27cab2feec227ef128974786c3de16262d3cb1f6c5ae398d2cfee3`. diff --git a/local-mcp-bun/README.md b/local-mcp-bun/README.md index bc292d5..71f256b 100644 --- a/local-mcp-bun/README.md +++ b/local-mcp-bun/README.md @@ -91,7 +91,8 @@ You only need to set bridge env vars when overriding defaults (for example custo - The runtime treats staleness as lack of valid MCP traffic for the configured number of minutes. - Automatic cleanup soft-reaps stale browser resources by cancelling queued waiters, releasing owned locks, and detaching owned tabs when possible. -- Live MCP transports stay open after stale cleanup so long-running Codex or agent threads can recover on their next tool call; idle unbound ingress sockets may still be closed on the same timeout. +- Live MCP transports stay open after stale cleanup so long-running Codex or agent threads can recover on their next tool call, even after their resource-reaped registry entry is later removed. +- Idle unbound ingress sockets may still be closed on the same timeout only when they lack recoverable initialized MCP state, so pre-initialize proxy leaks are reclaimed without killing recoverable long-running agents. - A resource-reaped session that never revives may be removed from the registry after another full timeout window; initialized live transports can still rebind on their next request. - The extension popup exposes a compact `Auto-cleanup` control with `Off` or `m`, plus `Run Cleanup Now`. diff --git a/local-mcp-bun/chrome-extension/manifest.json b/local-mcp-bun/chrome-extension/manifest.json index a54beb2..86cca4c 100644 --- a/local-mcp-bun/chrome-extension/manifest.json +++ b/local-mcp-bun/chrome-extension/manifest.json @@ -1,7 +1,7 @@ { "manifest_version": 3, "name": "Browser Use for AI Agents", - "version": "0.6.0", + "version": "0.6.1", "description": "A local-only extension bridge for multiplexed browser MCP. Modified by [KnotFalse]", "permissions": [ "tabs", diff --git a/local-mcp-bun/package.json b/local-mcp-bun/package.json index d69dff6..7cbe7f7 100644 --- a/local-mcp-bun/package.json +++ b/local-mcp-bun/package.json @@ -1,6 +1,6 @@ { "name": "@TWP-Technologies/Browser-MCP", - "version": "0.6.0", + "version": "0.6.1", "private": true, "type": "module", "description": "Local-only multiplexed Browser MCP server built with Bun", diff --git a/local-mcp-bun/src/bridge_transport.ts b/local-mcp-bun/src/bridge_transport.ts index 3572555..e10be20 100644 --- a/local-mcp-bun/src/bridge_transport.ts +++ b/local-mcp-bun/src/bridge_transport.ts @@ -1083,7 +1083,7 @@ export class in_memory_bridge_transport implements bridge_transport { { id: "local-mcp", name: "Local MCP", - version: "0.6.0", + version: "0.6.1", description: "Synthetic local MCP extension", enabled: true, install_type: "development", diff --git a/local-mcp-bun/src/daemon_ingress_server.ts b/local-mcp-bun/src/daemon_ingress_server.ts index 0e20ef4..7d794da 100644 --- a/local-mcp-bun/src/daemon_ingress_server.ts +++ b/local-mcp-bun/src/daemon_ingress_server.ts @@ -419,6 +419,11 @@ export class daemon_ingress_server { continue; } + const session = this.sessions_by_connection_id.get(connection_id); + if (session?.has_recoverable_initialized_state()) { + continue; + } + const socket = this.sockets_by_connection_id.get(connection_id); socket?.close(1000, "stale_connection_timeout"); } diff --git a/local-mcp-bun/src/mcp_protocol_session.ts b/local-mcp-bun/src/mcp_protocol_session.ts index f040abe..754790a 100644 --- a/local-mcp-bun/src/mcp_protocol_session.ts +++ b/local-mcp-bun/src/mcp_protocol_session.ts @@ -15,7 +15,7 @@ const local_protocol_version = "local-mcp-bun-v2"; export const server_info = { name: "local-mcp", - version: "0.6.0", + version: "0.6.1", } as const; interface mcp_protocol_session_options { @@ -105,6 +105,10 @@ export class mcp_protocol_session { } } + public has_recoverable_initialized_state(): boolean { + return !this.closed && this.initialized_auth_mode !== null; + } + private async dispatch_notification(request: json_rpc_request): Promise { if (request.method === "notifications/initialized" || request.method === "initialized") { this.touch_or_rebind_initialized_session_for_notification(); diff --git a/local-mcp-bun/tests/integration/daemon_ingress_cleanup.test.ts b/local-mcp-bun/tests/integration/daemon_ingress_cleanup.test.ts index d15b6e8..1fb77e9 100644 --- a/local-mcp-bun/tests/integration/daemon_ingress_cleanup.test.ts +++ b/local-mcp-bun/tests/integration/daemon_ingress_cleanup.test.ts @@ -90,6 +90,35 @@ async function wait_for_socket_open(socket: WebSocket): Promise { }); } +async function wait_for_socket_close(socket: WebSocket): Promise { + if (socket.readyState === WebSocket.CLOSED) { + return; + } + + await new Promise((resolve_promise, reject_promise) => { + const on_close = (): void => { + clearTimeout(timeout_id); + socket.removeEventListener("close", on_close); + socket.removeEventListener("error", on_error); + resolve_promise(); + }; + const on_error = (): void => { + clearTimeout(timeout_id); + socket.removeEventListener("close", on_close); + socket.removeEventListener("error", on_error); + reject_promise(new Error("websocket close failed")); + }; + const timeout_id = setTimeout(() => { + socket.removeEventListener("close", on_close); + socket.removeEventListener("error", on_error); + reject_promise(new Error("timed out waiting for websocket close")); + }, 5_000); + + socket.addEventListener("close", on_close); + socket.addEventListener("error", on_error); + }); +} + async function wait_for_initialize_response(socket: WebSocket): Promise { return await new Promise((resolve_promise, reject_promise) => { const timeout_id = setTimeout(() => { @@ -491,6 +520,7 @@ test("daemon ingress lets live sockets rebind when their session was closed else expect(ingress_internals.last_activity_at_ms_by_connection_id.get(connection_id)).toBeGreaterThanOrEqual( before_cleanup_ms, ); + ingress_internals.last_activity_at_ms_by_connection_id.set(connection_id, Date.now() - 3 * 60 * 60 * 1000); await ingress_internals.run_cleanup_tick(); expect(close_event).toBeNull(); expect(socket.readyState).toBe(WebSocket.OPEN); @@ -528,3 +558,38 @@ test("daemon ingress lets live sockets rebind when their session was closed else rmSync(daemon_state_dir, { recursive: true, force: true }); } }); + +test("daemon ingress closes stale unbound sockets without recoverable initialized state", async () => { + const { runtime, ingress, daemon_port, daemon_state_dir } = await create_test_ingress(25); + + const socket = new WebSocket(`ws://127.0.0.1:${daemon_port}/mcp`); + + try { + await wait_for_socket_open(socket); + + const ingress_internals = ingress as unknown as { + last_activity_at_ms_by_connection_id: Map; + run_cleanup_tick: () => Promise; + }; + const connection_id = [...ingress_internals.last_activity_at_ms_by_connection_id.keys()][0]; + if (typeof connection_id !== "number") { + throw new Error("expected daemon connection"); + } + + ingress_internals.last_activity_at_ms_by_connection_id.set(connection_id, Date.now() - 3 * 60 * 60 * 1000); + await ingress_internals.run_cleanup_tick(); + await wait_for_socket_close(socket); + + expect(socket.readyState).toBe(WebSocket.CLOSED); + } finally { + try { + socket.close(); + } catch { + // ignore best-effort close + } + + await ingress.stop(); + await runtime.stop(); + rmSync(daemon_state_dir, { recursive: true, force: true }); + } +}); diff --git a/specs/local-multiplexed-browser-mcp-changelog.md b/specs/local-multiplexed-browser-mcp-changelog.md index df869e1..7ff7c18 100644 --- a/specs/local-multiplexed-browser-mcp-changelog.md +++ b/specs/local-multiplexed-browser-mcp-changelog.md @@ -1,5 +1,24 @@ # Local Multiplexed Browser MCP Spec Changelog +## 2026-04-30 + +### Recoverable Stale Cleanup Transport Close + +- Daemon ingress now preserves unbound idle sockets when their protocol session still has recoverable initialized state, allowing long-running Codex/agent transports to rebind even after the old resource-reaped registry entry expires. +- Stale unbound sockets without recoverable initialized state remain eligible for cleanup so pre-initialize proxy leaks are still reclaimed. +- Local release identifiers were updated to `0.6.1` for the package, extension manifest, MCP server info, and in-memory synthetic extension fixture. + +### Verification + +- `bun test local-mcp-bun/tests/integration/daemon_ingress_cleanup.test.ts` +- `bun test local-mcp-bun/tests/unit/mcp_protocol_session.test.ts` +- `bun run --cwd local-mcp-bun lint:spec` +- `bun run --cwd local-mcp-bun lint:docs` +- `bun run --cwd local-mcp-bun lint:compliance` +- `bun run --cwd local-mcp-bun release:check-version -- --version v0.6.1` +- `bun run --cwd local-mcp-bun package:release -- --mode=server --version v0.6.1 --clean` +- Packaged binary smoke initialize reported `serverInfo.version="0.6.1"`. + ## 2026-04-25 ### Per-Client Artifact Roots Review Follow-Up diff --git a/specs/local-multiplexed-browser-mcp-spec.md b/specs/local-multiplexed-browser-mcp-spec.md index ef5adbb..e8d55ae 100644 --- a/specs/local-multiplexed-browser-mcp-spec.md +++ b/specs/local-multiplexed-browser-mcp-spec.md @@ -45,7 +45,7 @@ - User A calls `detach_from_tab(tab_id=101)` -> app MUST detach debugger and release lock atomically -> tab becomes attachable for other clients. - User extension process restarts -> app MUST mark extension bridge unavailable, fail active debug commands with deterministic retryable errors, and MUST clear/repair stale lock ownership based on detach/disconnect reconciliation logic -> service recovers without manual state surgery. - User MCP client crashes while holding lock -> app MUST detect channel close, auto-detach when possible, and MUST release lock after timeout guard if detach callback is unavailable -> deadlock is prevented. -- User leaves an MCP client idle past the configured session timeout -> app MUST soft-reap stale browser resources, cancel queued lock waiters, detach owned tabs when possible, keep live MCP transports usable, MAY close unrevived resource-reaped registry entries after another timeout window, and close only unbound stale daemon ingress sockets -> browser deadlock is prevented without forcing long-running agent threads to restart. +- User leaves an MCP client idle past the configured session timeout -> app MUST soft-reap stale browser resources, cancel queued lock waiters, detach owned tabs when possible, keep live MCP transports usable, MAY close unrevived resource-reaped registry entries after another timeout window, and close only stale daemon ingress sockets that are unbound and lack recoverable initialized MCP state -> browser deadlock is prevented without forcing long-running agent threads to restart. - User changes cleanup timeout in the extension popup -> extension MUST persist the timeout in minutes locally and MUST sync the cleanup policy to the runtime immediately when connected or on the next bridge reconnect -> UI and service policy converge without adding a separate settings surface. - User provides malformed tool arguments -> app MUST return input-validation error with field-level diagnostics -> caller can correct request without inspecting logs. @@ -161,7 +161,7 @@ | FR-014 | Cross-Platform Test Matrix | The system MUST run required test suites on Linux, macOS, and Windows in CI for release eligibility, and MUST publish per-OS hard-gate evidence artifacts for auditability. | Critical | Integration (CI workflow checks + artifact upload), E2E (matrix run with required pass gates), Performance (runtime bounds per suite). | | FR-015 | Hard Quality Gate | The release process MUST block merge when unit, integration, e2e, fault-injection, and concurrency suites fail. | Critical | Integration (branch protection + workflow checks), E2E (intentional failing test blocks merge), Regression (gate remains enforced). | | FR-016 | Stale Session Auto-Cleanup | The system MUST support configurable session idle cleanup in minutes, MUST treat `0` as disabled, and MUST soft-reap stale browser resources by cancelling queued waiters, releasing locks, and detaching owned tabs when possible while keeping live MCP transports recoverable. Resource-reaped sessions MAY be closed from the registry after one additional timeout window if they never revive. | High | Unit (timeout parsing + stale selection + expired reaped selection), Integration (manual cleanup action + soft reap + expired reaped registry close), Regression (post-cleanup tool call works on same transport). | -| FR-017 | Stale Proxy Connection Shutdown | In shared-daemon mode, the system MUST NOT close bound live ingress sockets solely because their session was stale, and MAY close unbound idle ingress sockets using the same timeout so pre-initialize proxy leaks are reclaimed. | High | Integration (daemon ingress soft-reap keeps socket open), Regression (unbound stale socket closes). | +| FR-017 | Stale Proxy Connection Shutdown | In shared-daemon mode, the system MUST NOT close bound live ingress sockets solely because their session was stale, and MAY close unbound idle ingress sockets using the same timeout only when they lack recoverable initialized MCP state so pre-initialize proxy leaks are reclaimed. | High | Integration (daemon ingress soft-reap keeps socket open), Regression (unbound stale socket closes). | | FR-018 | Legal and Branding Compliance Automation | The system MUST enforce Apache-2.0 modification notice and forbidden branding/endpoint checks via an automated compliance scanner mapped to requirement IDs. | High | Unit (scanner rule fixtures pass/fail), Integration (`lint:compliance` gate), Regression (forbidden token insertion fails CI). | ## 7.0 Measurable Non-Functional Requirements (NFRs) (Critical for Architecture) @@ -240,7 +240,7 @@ - On `chrome.debugger.onDetach`, lock MUST be released for matching `tab_id` unless reattachment is already pending in same owner session. - On client disconnect, owned locks MUST enter `releasing` and complete detach within configured timeout. - On stale-session cleanup, queued waiters for that session MUST be cancelled before any released lock can transfer ownership. -- In daemon/proxy mode, stale-session cleanup MUST keep a bound live ingress socket open and recoverable; direct mode MUST likewise keep stdio alive. Unbound idle ingress sockets MAY be closed, and unrevived resource-reaped session entries MAY be closed after one additional timeout window. +- In daemon/proxy mode, stale-session cleanup MUST keep initialized live ingress sockets open and recoverable, including after their resource-reaped registry entry is removed; direct mode MUST likewise keep stdio alive. Unbound idle ingress sockets MAY be closed only when they lack recoverable initialized MCP state, and unrevived resource-reaped session entries MAY be closed after one additional timeout window. - On extension bridge loss, active operations MUST fail fast with retryable errors and lock reconciliation MUST run at reconnect. - Lock reconciliation SHOULD consult debugger target metadata (`getTargets` with `attached` indicator) before final cleanup.