Skip to content

Commit 206c084

Browse files
feat(sparsekernel): own brokered browser popups
1 parent 6431fb4 commit 206c084

5 files changed

Lines changed: 169 additions & 32 deletions

File tree

docs/architecture/browser-broker.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ When `OPENCLAW_RUNTIME_BROWSER_BROKER=cdp` and `OPENCLAW_SPARSEKERNEL_BROWSER_CD
1919

2020
Set `OPENCLAW_RUNTIME_BROWSER_BROKER=native` to let SparseKernel launch and supervise a local Chromium-compatible process pool by trust zone and profile. The native pool uses a loopback-only remote debugging endpoint, a runtime-owned browser profile directory, and pooled process refcounts; the leased CDP context is released first, then the browser process is stopped after the pool idle timeout. Use `OPENCLAW_SPARSEKERNEL_BROWSER_EXECUTABLE` when Chrome/Chromium is not discoverable on `PATH` or a common platform path. Headless mode is on by default. `OPENCLAW_SPARSEKERNEL_BROWSER_NO_SANDBOX=1` is an explicit opt-out and should only be used when the host environment cannot run Chromium's sandbox.
2121

22-
Supported v0 actions (`status`, `doctor`, `profiles`, `tabs`, `open`, `navigate`, `focus`, `close`, `snapshot`, `console`, `screenshot`, `pdf`, direct file-input `upload`, `dialog`, and brokered `act`) operate against the leased CDP context. Brokered `act` covers the OpenClaw action contract for click, coordinate click, type, press, hover, scroll, drag, select, fill, resize, wait, evaluate, close, and batch using CDP input events plus bounded DOM evaluation. Selector-backed actions retry inside the leased page until their action timeout, and `wait --load networkidle` uses CDP Network events plus a quiet window rather than only checking `document.readyState`. Actions that can change page state are followed by a broker-side navigation check: same-target navigations are accepted only when the resulting URL stays inside the context's allowed-origin policy, while new tabs/windows are closed and rejected in v0 because they are unleased targets. Snapshots use a bounded CDP `Runtime.evaluate` DOM read, actions resolve refs from the latest brokered snapshot where needed, console output is captured from CDP runtime/log events, and screenshot/PDF output is captured as SparseKernel artifacts, read back through artifact access, and converted to existing tool result formats for compatibility. The context is retained for the active embedded run and released during broker cleanup, not opened and closed for every browser tool call.
22+
Supported v0 actions (`status`, `doctor`, `profiles`, `tabs`, `open`, `navigate`, `focus`, `close`, `snapshot`, `console`, `screenshot`, `pdf`, direct file-input `upload`, `dialog`, and brokered `act`) operate against the leased CDP context. Brokered `act` covers the OpenClaw action contract for click, coordinate click, type, press, hover, scroll, drag, select, fill, resize, wait, evaluate, close, and batch using CDP input events plus bounded DOM evaluation. Selector-backed actions retry inside the leased page until their action timeout, and `wait --load networkidle` uses CDP Network events plus a quiet window rather than only checking `document.readyState`. Actions that can change page state are followed by a broker-side navigation check: same-target navigations are accepted only when the resulting URL stays inside the context's allowed-origin policy, same-policy popups are attached as broker-owned targets, and disallowed popups are closed. Snapshots use a bounded CDP `Runtime.evaluate` DOM read, actions resolve refs from the latest brokered snapshot where needed, console output is captured from CDP runtime/log events, and screenshot/PDF output is captured as SparseKernel artifacts, read back through artifact access, and converted to existing tool result formats for compatibility. The context is retained for the active embedded run and released during broker cleanup, not opened and closed for every browser tool call.
2323

2424
BrowserContext isolation is session isolation, not host isolation. Playwright route blocking is useful request control, not a hard security boundary.
2525

docs/architecture/local-agent-kernel.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ The browser broker model is:
9696

9797
Important boundary: BrowserContext isolation is session isolation, not host isolation. Playwright route blocking and SSRF guards are useful controls, but they are not hard security boundaries.
9898

99-
The broker applies configured trust-zone network policy to explicit allowed origins before allocating a context. This is an egress guard for brokered contexts, not a kernel or VM boundary. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=cdp` and `OPENCLAW_SPARSEKERNEL_BROWSER_CDP_ENDPOINT=<loopback endpoint>` to make the OpenClaw browser tool acquire a real SparseKernel CDP context for the active run. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=managed` to use the existing OpenClaw browser control service as the managed process owner and let SparseKernel lease CDP contexts from its reported endpoint. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=native` to let SparseKernel launch and supervise a local Chromium-compatible process pool keyed by trust zone/profile, with process lifetime tied to brokered context leases and idle timeout. The runtime injects an internal browser proxy for supported navigation, tab, snapshot, console, screenshot, PDF, direct file-input upload, dialog, and action routes instead of exposing raw CDP to the agent. Brokered actions cover the OpenClaw action contract with CDP input events, bounded DOM evaluation, selector retry, CDP-backed network-idle waiting, and post-action navigation checks. Same-target action navigations must stay inside the context's allowed origins when a policy is configured; new tabs/windows are closed and rejected until the broker owns a multi-tab lease model. Screenshot and PDF outputs go through the artifact store.
99+
The broker applies configured trust-zone network policy to explicit allowed origins before allocating a context. This is an egress guard for brokered contexts, not a kernel or VM boundary. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=cdp` and `OPENCLAW_SPARSEKERNEL_BROWSER_CDP_ENDPOINT=<loopback endpoint>` to make the OpenClaw browser tool acquire a real SparseKernel CDP context for the active run. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=managed` to use the existing OpenClaw browser control service as the managed process owner and let SparseKernel lease CDP contexts from its reported endpoint. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=native` to let SparseKernel launch and supervise a local Chromium-compatible process pool keyed by trust zone/profile, with process lifetime tied to brokered context leases and idle timeout. The runtime injects an internal browser proxy for supported navigation, tab, snapshot, console, screenshot, PDF, direct file-input upload, dialog, and action routes instead of exposing raw CDP to the agent. Brokered actions cover the OpenClaw action contract with CDP input events, bounded DOM evaluation, selector retry, CDP-backed network-idle waiting, and post-action navigation checks. Same-target action navigations must stay inside the context's allowed origins when a policy is configured; same-policy popups are attached as broker-owned targets and disallowed popups are closed. Screenshot and PDF outputs go through the artifact store.
100100

101101
## Sandbox broker
102102

packages/browser-broker/src/index.test.ts

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -746,7 +746,7 @@ describe("@openclaw/sparsekernel-browser-broker", () => {
746746
expect(kernel.releasedContextIds).toEqual(["browser_ctx_1"]);
747747
});
748748

749-
it("closes new tabs opened by brokered actions instead of accepting unleased targets", async () => {
749+
it("attaches same-policy tabs opened by brokered actions", async () => {
750750
const kernel = new FakeKernel();
751751
const transport = new FakeCdpTransport();
752752
const broker = new SparseKernelCdpBrowserBroker({
@@ -771,7 +771,39 @@ describe("@openclaw/sparsekernel-browser-broker", () => {
771771
kind: "click",
772772
selector: "#popup",
773773
}),
774-
).rejects.toThrow(/opened a new tab\/window/);
774+
).resolves.toMatchObject({ ok: true, kind: "click" });
775+
await expect(broker.listTabs(context.ledger_context.id)).resolves.toEqual([
776+
expect.objectContaining({ targetId: "target-1" }),
777+
expect.objectContaining({ targetId: "target-popup" }),
778+
]);
779+
});
780+
781+
it("closes new tabs that violate the context policy", async () => {
782+
const kernel = new FakeKernel();
783+
const transport = new FakeCdpTransport();
784+
const broker = new SparseKernelCdpBrowserBroker({
785+
kernel,
786+
fetchImpl: async () =>
787+
Response.json({
788+
webSocketDebuggerUrl: "ws://127.0.0.1/devtools/browser/test",
789+
}),
790+
transportFactory: async () => transport,
791+
});
792+
793+
const context = await broker.acquireContext({
794+
trust_zone_id: "public_web",
795+
cdp_endpoint: "http://127.0.0.1:9222",
796+
initial_url: "https://example.com/start",
797+
allowed_origins: ["https://example.com"],
798+
});
799+
transport.queueActionNewTarget("https://blocked.example/popup", "target-popup");
800+
801+
await expect(
802+
broker.actContext(context.ledger_context.id, {
803+
kind: "click",
804+
selector: "#popup",
805+
}),
806+
).rejects.toThrow(/popup navigation blocked by allowed origins/);
775807
expect(transport.sent).toEqual(
776808
expect.arrayContaining([
777809
expect.objectContaining({

packages/browser-broker/src/index.ts

Lines changed: 90 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -286,6 +286,11 @@ type CdpEventMessage = {
286286
sessionId?: string;
287287
};
288288

289+
type LiveBrowserPage = {
290+
target_id: string;
291+
page_session_id: string;
292+
};
293+
289294
type LiveBrowserContext = MaterializedBrowserContext & {
290295
connection: CdpConnection;
291296
page_session_id: string;
@@ -296,6 +301,7 @@ type LiveBrowserContext = MaterializedBrowserContext & {
296301
network_request_ids: Set<string>;
297302
last_network_activity_at: number;
298303
allowed_origins: string[];
304+
pages: Map<string, LiveBrowserPage>;
299305
};
300306

301307
const NETWORK_IDLE_QUIET_MS = 500;
@@ -393,6 +399,7 @@ export class SparseKernelCdpBrowserBroker {
393399
network_request_ids: new Set(),
394400
last_network_activity_at: Date.now(),
395401
allowed_origins: allowedOrigins,
402+
pages: new Map([[targetId, { target_id: targetId, page_session_id: sessionId }]]),
396403
};
397404
connection.onEvent((event) => {
398405
recordConsoleEvent(materialized, event);
@@ -556,8 +563,8 @@ export class SparseKernelCdpBrowserBroker {
556563
): Promise<{ ok: true; targetId: string }> {
557564
const context = this.requireContext(contextId);
558565
const targetId = input.target_id?.trim();
559-
if (targetId && targetId !== context.target_id) {
560-
throw new Error(`SparseKernel CDP browser context does not own target: ${targetId}`);
566+
if (targetId) {
567+
this.activateTarget(context, targetId);
561568
}
562569
if (input.paths.length === 0) {
563570
throw new Error("SparseKernel browser upload requires at least one file path.");
@@ -607,7 +614,21 @@ export class SparseKernelCdpBrowserBroker {
607614
}
608615

609616
async listTabs(contextId: string): Promise<SparseKernelBrowserTab[]> {
610-
return [await this.describeContextTab(this.requireContext(contextId))];
617+
const context = this.requireContext(contextId);
618+
const tabs: SparseKernelBrowserTab[] = [];
619+
for (const page of context.pages.values()) {
620+
tabs.push(await this.describePageTab(context, page));
621+
}
622+
return tabs;
623+
}
624+
625+
async focusContext(
626+
contextId: string,
627+
input: { target_id: string },
628+
): Promise<SparseKernelBrowserTab> {
629+
const context = this.requireContext(contextId);
630+
this.activateTarget(context, input.target_id);
631+
return await this.describeContextTab(context);
611632
}
612633

613634
async snapshotContext(
@@ -670,8 +691,8 @@ export class SparseKernelCdpBrowserBroker {
670691
const context = this.requireContext(contextId);
671692
const targetId =
672693
"targetId" in request && typeof request.targetId === "string" ? request.targetId : undefined;
673-
if (targetId && targetId !== context.target_id) {
674-
throw new Error(`SparseKernel CDP browser context does not own target: ${targetId}`);
694+
if (targetId) {
695+
this.activateTarget(context, targetId);
675696
}
676697
switch (request.kind) {
677698
case "click":
@@ -898,7 +919,9 @@ export class SparseKernelCdpBrowserBroker {
898919
}
899920
this.contexts.delete(contextId);
900921
try {
901-
await context.connection.command("Target.closeTarget", { targetId: context.target_id });
922+
for (const targetId of context.pages.keys()) {
923+
await context.connection.command("Target.closeTarget", { targetId }).catch(() => {});
924+
}
902925
await context.connection.command("Target.disposeBrowserContext", {
903926
browserContextId: context.cdp_browser_context_id,
904927
});
@@ -919,6 +942,15 @@ export class SparseKernelCdpBrowserBroker {
919942
return context;
920943
}
921944

945+
private activateTarget(context: LiveBrowserContext, targetId: string): void {
946+
const page = context.pages.get(targetId);
947+
if (!page) {
948+
throw new Error(`SparseKernel CDP browser context does not own target: ${targetId}`);
949+
}
950+
context.target_id = page.target_id;
951+
context.page_session_id = page.page_session_id;
952+
}
953+
922954
private async navigate(
923955
context: LiveBrowserContext,
924956
url: string,
@@ -1057,12 +1089,22 @@ export class SparseKernelCdpBrowserBroker {
10571089
observation: PostActionNavigationObservation | undefined,
10581090
): Promise<void> {
10591091
if (observation?.kind === "new-target") {
1060-
await context.connection
1061-
.command("Target.closeTarget", { targetId: observation.targetId })
1062-
.catch(() => {});
1063-
throw new Error(
1064-
`SparseKernel browser action opened a new tab/window (${observation.targetId}); new targets are not accepted by the v0 broker.`,
1065-
);
1092+
try {
1093+
if (observation.url) {
1094+
assertUrlAllowedByOrigins(observation.url, context.allowed_origins, "popup navigation");
1095+
} else if (context.allowed_origins.length > 0) {
1096+
throw new Error(
1097+
`SparseKernel browser popup navigation blocked by allowed origins: target ${observation.targetId} did not report a URL`,
1098+
);
1099+
}
1100+
} catch (error) {
1101+
await context.connection
1102+
.command("Target.closeTarget", { targetId: observation.targetId })
1103+
.catch(() => {});
1104+
throw error;
1105+
}
1106+
await this.attachNewTarget(context, observation.targetId);
1107+
return;
10661108
}
10671109
if (observation?.kind === "same-target") {
10681110
await context.connection
@@ -1089,7 +1131,40 @@ export class SparseKernelCdpBrowserBroker {
10891131
return (await this.describeContextTab(context)).url;
10901132
}
10911133

1134+
private async attachNewTarget(context: LiveBrowserContext, targetId: string): Promise<void> {
1135+
const existing = context.pages.get(targetId);
1136+
if (existing) {
1137+
this.activateTarget(context, targetId);
1138+
return;
1139+
}
1140+
const { sessionId } = await context.connection.command<{ sessionId: string }>(
1141+
"Target.attachToTarget",
1142+
{
1143+
flatten: true,
1144+
targetId,
1145+
},
1146+
);
1147+
await context.connection.command("Page.enable", {}, sessionId);
1148+
await context.connection.command("Runtime.enable", {}, sessionId);
1149+
await context.connection.command("Network.enable", {}, sessionId).catch(() => {});
1150+
await context.connection.command("Log.enable", {}, sessionId).catch(() => {});
1151+
context.pages.set(targetId, { target_id: targetId, page_session_id: sessionId });
1152+
context.target_id = targetId;
1153+
context.page_session_id = sessionId;
1154+
context.snapshot_refs = new Map();
1155+
}
1156+
10921157
private async describeContextTab(context: LiveBrowserContext): Promise<SparseKernelBrowserTab> {
1158+
return await this.describePageTab(context, {
1159+
target_id: context.target_id,
1160+
page_session_id: context.page_session_id,
1161+
});
1162+
}
1163+
1164+
private async describePageTab(
1165+
context: LiveBrowserContext,
1166+
page: LiveBrowserPage,
1167+
): Promise<SparseKernelBrowserTab> {
10931168
let title: string | undefined;
10941169
let url: string | undefined;
10951170
try {
@@ -1101,7 +1176,7 @@ export class SparseKernelCdpBrowserBroker {
11011176
expression: "JSON.stringify({ title: document.title, url: location.href })",
11021177
returnByValue: true,
11031178
},
1104-
context.page_session_id,
1179+
page.page_session_id,
11051180
);
11061181
const raw = evaluated.result?.value;
11071182
if (typeof raw === "string") {
@@ -1113,8 +1188,8 @@ export class SparseKernelCdpBrowserBroker {
11131188
// Best-effort tab metadata; target id is the durable handle for the lease.
11141189
}
11151190
return {
1116-
targetId: context.target_id,
1117-
suggestedTargetId: context.target_id,
1191+
targetId: page.target_id,
1192+
suggestedTargetId: page.target_id,
11181193
...(title ? { title } : {}),
11191194
...(url ? { url } : {}),
11201195
type: "page",

0 commit comments

Comments
 (0)