blog: Computer-Use Agents Have No Tool Boundary by amavashev · Pull Request #650 · runcycles/cycles-docs

amavashev · 2026-05-15T16:48:56Z

Summary

New pillar post extending action authority to the click/type surface of screen-control agents — completing a sibling trilogy with the just-shipped memory-writes (PR #648) and merge-button (PR #649) posts.

Frames the central problem: when the agent's tool surface is `click` and `type`, RISK_POINTS by tool name doesn't work — every state-changing call is the same tool name.
Risk classification has to move from tool to (target, intent, context): URL pattern, site authority, DOM target (or coordinate region for pixel agents), action verb, modifier inputs, session history.
Introduces a target-intent risk schedule conditioned on (target, intent, context) and a fresh-screenshot cap as the screen-specific equivalent of cumulative-authority caps.
Stack-by-stack table for OpenAI ChatGPT agent / Anthropic Claude Computer Use / Browser-Use shows where the gate can sit in each.
Mirrors the PocketOS two-layer fix (browser-session scoping + agent-side runtime authority).
13 unique body cross-links + 8 Next Steps + 8 glossary auto-links

Author: Albert Mavashev
Date: 2026-05-19
Word count: ~3,450 body

Reviews

Internal cycles 1–3 (scorecard 9.3/10)
Glossary auto-linker applied 8 contextual links
Codex external review: round 1 REVISE-MINOR (7 findings, 5 applied / 2 pushed back), round 2 REVISE-MINOR (3 findings, all applied), round 3 SHIP

Codex verified upstream via GitHub/web connector:

OpenAI ChatGPT agent / Operator / CUA relationship (ChatGPT agent absorbed Operator's capabilities; CUA was the model behind Operator; not strictly accurate to say CUA "powers" ChatGPT agent)
Anthropic Computer Use launched October 2024 as API capability, customer-hosted (Linux Docker reference)
Claude Computer Use action set (`left_click`, `type`, `scroll`, `key`)
Anthropic's two confirmation mechanisms: developer-side guidance for sensitive actions, plus prompt-injection classifier
Browser-Use exposes DOM-aware indexed clickable elements (not pixel-based)

Three factual errors caught in cycle 1 and fixed before codex saw the post:

Anthropic Computer Use launch date corrected from "March 2026 macOS" to "October 2024 API capability, Linux Docker reference"
Operator sunset date narrowed from "August 2025" to "in 2025" (the specific date was disputable across secondary sources)
Browser-Use 89.1% WebVoyager number softened to "strong scores on the WebVoyager benchmark" since the citation URL didn't contain the specific number

Per-dimension scores

Dimension	Score
Factual accuracy	9.5
Credibility	9
Cross-links	9.5
SEO (title 41/51, desc 158/160)	9.5
Code accuracy	9
Structure & flow	9.5
Terminology	9.5
Tone & style	9.5

Overall: 9.4 / 10

Test plan

`npm run dev` and verify post renders at `/blog/computer-use-agents-have-no-tool-boundary`
Verify post appears on `/blog/` index sorted to top (date 2026-05-19)
Click through all internal links and confirm they resolve (sibling links to memory-writes and merge depend on PR blog: Agent Memory Writes Are Actions, Too #648 and PR blog: When Coding Agents Press Merge #649 being merged first)
Confirm date/author/tags/reading-time header renders above body
Confirm Prev/Next post navigation works
`npm run build` succeeds with no broken-link warnings

Dependencies and order

This post links to /blog/agent-memory-writes-are-actions-too (PR #648) and /blog/when-coding-agents-press-merge (PR #649). Merge order matters: PR #648 → PR #649 → this PR, so cross-links resolve on main as the trilogy lands.

Tags

Two new pillar tags introduced for the new surface: `computer-use` and `browser-agents`. Other tags (`action-authority`, `action-control`, `agents`, `governance`, `runtime-authority`, `security`, `RISK_POINTS`) match corpus convention.

New pillar post applying action authority to screen-control agents. Closes a corpus gap: the four siblings (outbound side effects, memory writes, merge buttons, and now clicks) all run through the same reserve-commit lifecycle, but each surface has its own feature vector for the rule body. The post frames the central problem: when the agent's tool surface is `click` and `type`, RISK_POINTS by tool name doesn't work — every state-changing call is the same tool. Risk classification has to move from tool to (target, intent, context). The schedule rows are URL pattern + site authority + DOM target + action verb + modifier inputs + session history. Introduces a fresh-screenshot cap as the screen-specific equivalent of cumulative-authority caps. Stack-by-stack table for OpenAI CUA / Anthropic Computer Use / Browser-Use shows where the gate can sit in each. Internal cross-links to ai-agent-action-control, ai-agent-risk- assessment, agent-memory-writes-are-actions-too (sibling), when-coding- agents-press-merge (sibling), zero-trust-for-ai-agents, pocketos- aftermath, plus how-to and protocol references. External citations: OpenAI ChatGPT agent intro, Anthropic Computer Use announcement, Browser-Use repo. Reviews: internal cycles 1-3 (scorecard 9.3/10), glossary linker added 8 contextual links. Three factual errors from cycle 1 fact-check fixed (Anthropic Computer Use launch date, Operator sunset date specificity, Browser-Use benchmark citation). Two new tags introduced for the new surface: computer-use, browser-agents.

…boundary Apply/skip tally: 5 applied, 2 pushed back. Applied: - Opener mismatch: Browser-Use is DOM/index-aware, not pixel-based. Rewrote the opener to use "a pixel-based computer-use agent" so the coordinate/screenshot failure mode fits the named agent class. The later sections covering Browser-Use as DOM-aware remain consistent. - L31 CUA / ChatGPT agent claim: "now powers ChatGPT agent mode" was too strong; OpenAI describes ChatGPT agent as integrating Operator's capabilities, and the ChatGPT-agent model is compared against "o3-powered CUA," not described as powered by CUA. Reworded to "a member of the OpenAI computer-use lineage that now ships under ChatGPT agent" — accurate and hedged. - L31 anchor text: "recent extensions" split across two weak link anchors. Reworded so the anchor text uses the post topics ("memory writes" and "merge buttons"). - L119 enforcement input: "action verb in the chain-of-thought" is not a reliable enforcement input — reasoning isn't always exposed. Softened to "the agent's stated next-step text." - L153 Anthropic confirmation behavior: my prose conflated two distinct Anthropic mechanisms — (a) documented developer guidance to ask for human confirmation on cookies/financial/ToS categories, and (b) a prompt-injection classifier that flags suspicious screenshots. Split into two adjacent layers with the right framing for each. - L196 / L204 absolutes: "the only class" → "one of the few classes"; "the default for most teams" → "many deployments are closer to 'no'." Skipped, with reason: - Body cross-links inside bullets outside Next Steps (L133, L171, L172): all flagged links are glossary auto-link clarifiers inside bullets that describe patterns, not link dumps. Same defensible push-back used in the memory and merge posts. - 2026-05-19 publish date: intentional to keep the trilogy sequence (memory 5/16, merge 5/18, computer-use 5/19). Codex verified upstream: CUA / Operator / ChatGPT agent relationship, Claude Computer Use action set including left_click + type + scroll + key, Anthropic prompt-injection classifier and sensitive-action guidance, Browser-Use DOM-aware action vocabulary.

…boundary Apply/skip tally: 3 applied, 0 pushed back. Applied: - L43 table row: "OpenAI CUA (powers ChatGPT agent)" was still overstating the CUA / ChatGPT agent relationship. Rewrote to "OpenAI's computer-use lineage — CUA, ChatGPT agent" so the row names the lineage without claiming one model powers the other. - L184 stack-by-stack table: "OpenAI ChatGPT agent (CUA-backed)" carried the same overclaim. Rewrote to "OpenAI ChatGPT agent (Operator-derived web interaction)" which is accurate to OpenAI's own framing. - L144 controls table: the row "Claude Computer Use prompt-injection classifier" still bundled developer-side sensitive-action guidance with the injection classifier. Split into two distinct rows: one for Anthropic's documented developer guidance on sensitive actions (cookies/financial/ToS — an agent-harness pattern), and one for the prompt-injection classifier (screenshot-flagging heuristic). Now matches the prose at L153.

…-pause-to-reserve Apply/skip tally: 9 applied, 2 pushed back. Applied: - `response.function_call` → `response.function_call_arguments.*`: the OpenAI Realtime API uses function-call output items and the function_call_arguments streaming events; my original event name was not a real Realtime server event. Fixed in both the prose and the stack-by-stack table. - 80-150 ms relay hop: removed the specific band attribution. The OpenAI page does not state it. Generic phrasing: "a forwarding hop sized to fit inside the conversation's latency budget." - ElevenLabs row: clarified the $0.08-$0.24/min framing. Hosting is $0.08/min flat or $0.16/min burst; the $0.24 ceiling derives once LLM and telephony layer on at cost. - Vapi row: labeled the $0.115-$0.42/min range as an estimate (it's derived from $0.05/min orchestration plus a BYOK provider stack at cost; the actual all-in depends on provider choices). - 17-minute "$1.50-$8.00 model spend alone": tightened to "against the per-minute stack rates above" since the rates in the table mix all-in / provider / orchestration models. - Provider-layer caps: softened from "OpenAI, Vapi, Retell AI, and ElevenLabs all expose per-call or per-session limits" to "to whatever degree each provider exposes them — typically through per-session budget headers, dashboard caps, or programmatic limits." Pricing pages don't uniformly establish hard caps. - "Most production voice teams use this only..." for speculative commit: softened to "This pattern is usually safer on the slow-path tool layer." - Description trimmed 162 → 152 chars: changed "—" to ":", "sit synchronously in the path" to "sync on the hot path." - `reserve-commit` glossary link: pointed to /protocol/how-reserve- commit-works-in-cycles instead of /glossary#reservation (reserve-commit is a lifecycle term, not the reservation entry). Skipped, with reason: - Body cross-link count (11) above 5-8 pillar target: three of the eleven are the trilogy references in a single closing sentence that names the sibling extension series (memory-writes, merge, computer-use). They are coherent as a triple, not redundant. - 2026-05-20 publish date: intentional sequence after the trilogy (5/16, 5/18, 5/19, 5/20). Codex verified upstream: ElevenLabs/Vapi/Retell AI pricing pages, OpenAI Realtime API event surface (function_call_arguments.delta / .done are the actual streaming events), and the cycles-docs main- branch internal targets. Sibling links to memory-writes, merge, and computer-use treated as just-merged via PR #648-#650.

Moved from 2026-05-19 to 2026-05-30 to match a weekly publishing cadence for the action-authority extension arc. Sequence now: memory 5/16, merge 5/23, computer-use 5/30.

amavashev added 3 commits May 15, 2026 12:33

This was referenced May 15, 2026

blog: Reserving Authority When You Can't Pause #651

Open

blog: What Four New Surfaces Taught Us #652

Open

blog: Rolling Out Action Authority on New Surfaces #653

Open

blog: reschedule computer-use-agents-have-no-tool-boundary to 2026-05-30

bd16837

Moved from 2026-05-19 to 2026-05-30 to match a weekly publishing cadence for the action-authority extension arc. Sequence now: memory 5/16, merge 5/23, computer-use 5/30.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blog: Computer-Use Agents Have No Tool Boundary#650

blog: Computer-Use Agents Have No Tool Boundary#650
amavashev wants to merge 4 commits into
mainfrom
blog/computer-use-agents-have-no-tool-boundary

amavashev commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amavashev commented May 15, 2026

Summary

Reviews

Per-dimension scores

Test plan

Dependencies and order

Tags

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant