blog: Computer-Use Agents Have No Tool Boundary#650
Open
amavashev wants to merge 4 commits into
Open
Conversation
New pillar post applying action authority to screen-control agents. Closes a corpus gap: the four siblings (outbound side effects, memory writes, merge buttons, and now clicks) all run through the same reserve-commit lifecycle, but each surface has its own feature vector for the rule body. The post frames the central problem: when the agent's tool surface is `click` and `type`, RISK_POINTS by tool name doesn't work — every state-changing call is the same tool. Risk classification has to move from tool to (target, intent, context). The schedule rows are URL pattern + site authority + DOM target + action verb + modifier inputs + session history. Introduces a fresh-screenshot cap as the screen-specific equivalent of cumulative-authority caps. Stack-by-stack table for OpenAI CUA / Anthropic Computer Use / Browser-Use shows where the gate can sit in each. Internal cross-links to ai-agent-action-control, ai-agent-risk- assessment, agent-memory-writes-are-actions-too (sibling), when-coding- agents-press-merge (sibling), zero-trust-for-ai-agents, pocketos- aftermath, plus how-to and protocol references. External citations: OpenAI ChatGPT agent intro, Anthropic Computer Use announcement, Browser-Use repo. Reviews: internal cycles 1-3 (scorecard 9.3/10), glossary linker added 8 contextual links. Three factual errors from cycle 1 fact-check fixed (Anthropic Computer Use launch date, Operator sunset date specificity, Browser-Use benchmark citation). Two new tags introduced for the new surface: computer-use, browser-agents.
…boundary
Apply/skip tally: 5 applied, 2 pushed back.
Applied:
- Opener mismatch: Browser-Use is DOM/index-aware, not pixel-based.
Rewrote the opener to use "a pixel-based computer-use agent" so the
coordinate/screenshot failure mode fits the named agent class. The
later sections covering Browser-Use as DOM-aware remain consistent.
- L31 CUA / ChatGPT agent claim: "now powers ChatGPT agent mode" was
too strong; OpenAI describes ChatGPT agent as integrating Operator's
capabilities, and the ChatGPT-agent model is compared against
"o3-powered CUA," not described as powered by CUA. Reworded to "a
member of the OpenAI computer-use lineage that now ships under
ChatGPT agent" — accurate and hedged.
- L31 anchor text: "recent extensions" split across two weak link
anchors. Reworded so the anchor text uses the post topics ("memory
writes" and "merge buttons").
- L119 enforcement input: "action verb in the chain-of-thought" is not
a reliable enforcement input — reasoning isn't always exposed.
Softened to "the agent's stated next-step text."
- L153 Anthropic confirmation behavior: my prose conflated two
distinct Anthropic mechanisms — (a) documented developer guidance to
ask for human confirmation on cookies/financial/ToS categories, and
(b) a prompt-injection classifier that flags suspicious screenshots.
Split into two adjacent layers with the right framing for each.
- L196 / L204 absolutes: "the only class" → "one of the few classes";
"the default for most teams" → "many deployments are closer to 'no'."
Skipped, with reason:
- Body cross-links inside bullets outside Next Steps (L133, L171,
L172): all flagged links are glossary auto-link clarifiers inside
bullets that describe patterns, not link dumps. Same defensible
push-back used in the memory and merge posts.
- 2026-05-19 publish date: intentional to keep the trilogy sequence
(memory 5/16, merge 5/18, computer-use 5/19).
Codex verified upstream: CUA / Operator / ChatGPT agent relationship,
Claude Computer Use action set including left_click + type + scroll +
key, Anthropic prompt-injection classifier and sensitive-action
guidance, Browser-Use DOM-aware action vocabulary.
…boundary Apply/skip tally: 3 applied, 0 pushed back. Applied: - L43 table row: "OpenAI CUA (powers ChatGPT agent)" was still overstating the CUA / ChatGPT agent relationship. Rewrote to "OpenAI's computer-use lineage — CUA, ChatGPT agent" so the row names the lineage without claiming one model powers the other. - L184 stack-by-stack table: "OpenAI ChatGPT agent (CUA-backed)" carried the same overclaim. Rewrote to "OpenAI ChatGPT agent (Operator-derived web interaction)" which is accurate to OpenAI's own framing. - L144 controls table: the row "Claude Computer Use prompt-injection classifier" still bundled developer-side sensitive-action guidance with the injection classifier. Split into two distinct rows: one for Anthropic's documented developer guidance on sensitive actions (cookies/financial/ToS — an agent-harness pattern), and one for the prompt-injection classifier (screenshot-flagging heuristic). Now matches the prose at L153.
amavashev
added a commit
that referenced
this pull request
May 15, 2026
…-pause-to-reserve Apply/skip tally: 9 applied, 2 pushed back. Applied: - `response.function_call` → `response.function_call_arguments.*`: the OpenAI Realtime API uses function-call output items and the function_call_arguments streaming events; my original event name was not a real Realtime server event. Fixed in both the prose and the stack-by-stack table. - 80-150 ms relay hop: removed the specific band attribution. The OpenAI page does not state it. Generic phrasing: "a forwarding hop sized to fit inside the conversation's latency budget." - ElevenLabs row: clarified the $0.08-$0.24/min framing. Hosting is $0.08/min flat or $0.16/min burst; the $0.24 ceiling derives once LLM and telephony layer on at cost. - Vapi row: labeled the $0.115-$0.42/min range as an estimate (it's derived from $0.05/min orchestration plus a BYOK provider stack at cost; the actual all-in depends on provider choices). - 17-minute "$1.50-$8.00 model spend alone": tightened to "against the per-minute stack rates above" since the rates in the table mix all-in / provider / orchestration models. - Provider-layer caps: softened from "OpenAI, Vapi, Retell AI, and ElevenLabs all expose per-call or per-session limits" to "to whatever degree each provider exposes them — typically through per-session budget headers, dashboard caps, or programmatic limits." Pricing pages don't uniformly establish hard caps. - "Most production voice teams use this only..." for speculative commit: softened to "This pattern is usually safer on the slow-path tool layer." - Description trimmed 162 → 152 chars: changed "—" to ":", "sit synchronously in the path" to "sync on the hot path." - `reserve-commit` glossary link: pointed to /protocol/how-reserve- commit-works-in-cycles instead of /glossary#reservation (reserve-commit is a lifecycle term, not the reservation entry). Skipped, with reason: - Body cross-link count (11) above 5-8 pillar target: three of the eleven are the trilogy references in a single closing sentence that names the sibling extension series (memory-writes, merge, computer-use). They are coherent as a triple, not redundant. - 2026-05-20 publish date: intentional sequence after the trilogy (5/16, 5/18, 5/19, 5/20). Codex verified upstream: ElevenLabs/Vapi/Retell AI pricing pages, OpenAI Realtime API event surface (function_call_arguments.delta / .done are the actual streaming events), and the cycles-docs main- branch internal targets. Sibling links to memory-writes, merge, and computer-use treated as just-merged via PR #648-#650.
This was referenced May 15, 2026
Moved from 2026-05-19 to 2026-05-30 to match a weekly publishing cadence for the action-authority extension arc. Sequence now: memory 5/16, merge 5/23, computer-use 5/30.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New pillar post extending action authority to the click/type surface of screen-control agents — completing a sibling trilogy with the just-shipped memory-writes (PR #648) and merge-button (PR #649) posts.
Author: Albert Mavashev
Date: 2026-05-19
Word count: ~3,450 body
Reviews
Codex verified upstream via GitHub/web connector:
Three factual errors caught in cycle 1 and fixed before codex saw the post:
Per-dimension scores
Overall: 9.4 / 10
Test plan
Dependencies and order
This post links to /blog/agent-memory-writes-are-actions-too (PR #648) and /blog/when-coding-agents-press-merge (PR #649). Merge order matters: PR #648 → PR #649 → this PR, so cross-links resolve on main as the trilogy lands.
Tags
Two new pillar tags introduced for the new surface: `computer-use` and `browser-agents`. Other tags (`action-authority`, `action-control`, `agents`, `governance`, `runtime-authority`, `security`, `RISK_POINTS`) match corpus convention.