blog: What Four New Surfaces Taught Us by amavashev · Pull Request #652 · runcycles/cycles-docs

amavashev · 2026-05-15T20:14:34Z

Summary

A shorter (~2,000 word) reflective synthesis post tying together the four sibling extensions shipped this session: memory writes (PR #648), merge buttons (PR #649), computer-use clicks (PR #650), and voice frames (PR #651).

The thesis: the reserve-commit lifecycle that was first written for outbound tool calls absorbed all four surfaces without modification to the decision primitive. What changed across the surfaces was the binding — the feature vector, the blast radius shape, the timing of the decision, the audit cardinality — not the primitives.

What each surface added:

Surface	Dimension added
Memory writes	Temporal blast radius
Merge buttons	Trust elevation (promotion vs creation)
Computer-use clicks	(target, intent, context) feature vector
Voice frames	Latency constraint

The post also includes a five-row comparison table across all four surfaces (feature vector / blast radius / gate timing / audit cardinality), a predictions section (multi-agent voice-to-voice, embodied agents, infrastructure provisioning), and a "What Did Not Generalize" section calling out that provider-side fixes and harness integration work remain surface-specific.

Format is essayistic rather than tabular — a step back, not another surface extension.

Author: Albert Mavashev
Date: 2026-05-21
Word count: ~2,000 body

Reviews

Internal cycles 1–3 (scorecard 9.4/10)
Glossary auto-linker applied 6 contextual links
Codex external review: round 1 REVISE-MINOR (5 findings, 5 applied / 1 pushed back), round 2 REVISE-MINOR (2 residual findings, 2 applied), round 3 SHIP

Codex verified all four surface contribution claims match their respective sibling posts (memory's temporal dimension, merge's trust elevation, clicks' target/intent/context, voice's latency constraint). The synthesis is faithful to the source posts.

Notable changes through review

Cycle 1 trimmed:

Meta-framing intro sentence cut
Self-congratulatory closing ("Four for four is a good track record. The framework keeps earning its surface area") rewritten as a structural claim
"The voice case is the interesting one" → "Voice is the load-bearing case"
Filler in closing third compressed
One H2 renamed from "The Lifecycle Is the Stable Layer" to "Reserve-Commit Is the Stable Layer" for keyword carry

Codex rounds caught and corrected:

"No modifications to the lifecycle" overstated — voice fast-path uses predictive reservation, not per-action reserve-commit. Added dedicated "Voice is the partial exception" paragraph. Reframed to "lifecycle preserved at the decision boundaries."
Click table row "One target, configurable" → "Single DOM target; severity depends on target + context"
Voice table row "1 per call + brackets" → "1 per call (with periodic re-checks)" — brackets are cadence, not guaranteed audit emissions
"Nothing analogous shows up for send_email or deploy" hedged — deploys can have analogous approval-loop failures; merge is where the framework first foregrounded distinct-approver caps
Predictions section softened across the board: "will absorb" → "the likely shape," "Rollback windows do not exist" → "are much narrower if they exist at all." Embodied agents bullet now flags physical irreversibility as "the strongest test the framework has not yet faced."
L67 "it did not change" softened to acknowledge the binding/cadence varies even when the decision primitive doesn't

Per-dimension scores

Dimension	Score
Factual accuracy	9.5
Credibility	9
Cross-links	9.5
SEO (title 32/51, desc 153/160)	9
Code accuracy	10
Structure & flow	9.5
Terminology	9.5
Tone & style	9.5

Overall: 9.4 / 10

Test plan

`npm run dev` and verify post renders at `/blog/what-four-new-surfaces-taught-us`
Verify post appears on `/blog/` index sorted to top (date 2026-05-21)
Click through all internal links — four sibling links depend on PRs blog: Agent Memory Writes Are Actions, Too #648/blog: When Coding Agents Press Merge #649/blog: Computer-Use Agents Have No Tool Boundary #650/blog: Reserving Authority When You Can't Pause #651 being merged first
Confirm date/author/tags/reading-time header renders above body
Confirm Prev/Next post navigation works
`npm run build` succeeds with no broken-link warnings

Dependencies

This post depends on all four sibling PRs being merged first. Required merge order: #648 → #649 → #650 → #651 → this PR. The opener and the table both reference the four siblings as a foundational claim; partial merges would leave broken cross-links and a half-formed thesis.

Shorter (~2,000 word) reflective synthesis post tying together the four sibling extensions shipped this session: memory writes, merge buttons, computer-use clicks, voice frames. The thesis: the reserve-commit lifecycle that was first written for outbound tool calls absorbed all four surfaces without modification. What changed across the surfaces was the binding — the feature vector, the blast radius shape, the timing of the decision, the audit cardinality — not the primitives. Four sections: 1. The Primitive That Held — what didn't change 2. What Differed Between Surfaces — feature vector, blast radius shape, timing of the decision, audit cardinality (with a five-row comparison table) 3. What Each Surface Added — temporal dimension (memory), trust elevation (merge), (target, intent, context) (clicks), latency constraint (voice) 4. What This Predicts for the Next Surface — multi-agent voice-to-voice, embodied agents, infrastructure provisioning 5. What Did Not Generalize — provider-side fixes and harness-specific integration work 6. Reserve-Commit Is the Stable Layer — the structural takeaway Format is less tabular, more essayistic than the four siblings. One first-person "I" earns its place in the reflective intro. Reviews: internal cycles 1-3 (scorecard 9.4/10), glossary linker added 6 contextual links. Cycle 1 reviews: - Synthesis-vs-siblings consistency check confirmed all four surface claims match their respective sibling posts verbatim or close to it. - Style review caught and fixed: filler at line 30 (meta-framing) and line 81 (redundancy), self-congratulatory closing ("Four for four is a good track record. The framework keeps earning its surface area") rewritten as a structural claim, "the voice case is the interesting one" → "Voice is the load-bearing case," "100 ms ceiling on individual frames" → corrected framing. - One H2 renamed from "The Lifecycle Is the Stable Layer" to "Reserve-Commit Is the Stable Layer" for keyword carry. Cross-links to all four sibling PRs (#648-651), what-is-runtime- authority anchor, runtime-authority-vs-guardrails comparison, and the parent action-control post. This post depends on the four siblings being merged first.

Apply/skip tally: 5 applied, 1 pushed back. Applied: - "No modifications to the lifecycle" overstated: voice is a partial exception (the fast audio path uses predictive reservation / floor authority instead of per-action reserve-commit). Added an explicit "Voice is the partial exception" paragraph to the closing section that names the cadence-shift honestly. Reframed "Four surfaces, no modifications" to "Four surfaces, with the lifecycle preserved at the decision boundaries." - Click row in the table: "One target, configurable" was vague. Tightened to "Single DOM target; severity depends on target + context" to match the sibling's framing. - Voice row in the table: "1 per call + brackets" implied bracket checks were always part of audit cardinality. Reworded to "1 per call (with periodic re-checks)" — brackets are cadence, not guaranteed audit emissions. - "Nothing analogous shows up for send_email or deploy" overstated. Reworded to acknowledge that deploys and other promotion gates can have analogous approval-loop failures, while crediting merge as where the corpus first foregrounded distinct-approver caps. - Predictions overconfidence: "will absorb," "do not need to change," "assume it does" softened. Added an explicit "the hypothesis the four-surface evidence supports is..." framing and noted that "each new surface remains a real test of that hypothesis, not a forgone conclusion." Closing section rewrites "assume it does" as "the lifecycle is the most likely starting point" with explicit "though new surfaces should be expected to stretch the binding the way voice did" caveat. Skipped, with reason: - Publication timing question: 5/21 is intentional after the 5/16- 5/20 sequence of memory/merge/click/voice posts. "Last week" is faithful to that sequence. Codex verified the synthesis-vs-sibling claims still hold after these softenings; the four-surface "what each added" assignments (temporal, trust elevation, target/intent/context, latency constraint) all match the actual sibling posts.

Apply/skip tally: 2 applied, 0 pushed back. Applied: - L67 "it did not change" absolute: replaced with "The lifecycle itself does not appear in this table: the table tracks what varies (binding and cadence), not the decision primitive..." Aligns with the voice caveat added in round 1. - Prediction bullets (L87-91) hard future language: softened - "will absorb it the same way" → "the likely shape" / "plausibly applies" - "Rollback windows do not exist" → "are much narrower if they exist at all" - "The framework absorbs it" → "Probably absorbs cleanly" - "dominated by Tier 4 events" → "likely dominated by Tier 4 events" Embodied agents bullet now explicitly flags that "physical irreversibility is the strongest test the framework has not yet faced" — concedes the open question.

…ew-surfaces Apply/skip tally: 8 applied, 0 pushed back. Applied: - L36 synthesis quote: replaced "the lifecycle is the stable layer" (which is not the exact synthesis H2 wording) with prose paraphrase that aligns with the actual H2 "Reserve-Commit Is the Stable Layer." - L45 / L140 / L225 "risk order" / "lowest-risk" framing aligned with L142 clarification: now "false-positive-cost order" / "lowest-false-positive-cost" throughout, matching how the cutover order is actually ranked. - L103 absolute "the quota is wrong / not constraining anything" softened to "Substantially higher rates suggest...; substantially lower rates suggest...". Calibration target labeled as starting heuristic. - L125 "Most shadow weeks produce a clean bimodal distribution" hedged: "When the shadow data produces a clearly bimodal distribution, the cap belongs in the gap; when it does not, the schedule needs more (target, intent) features." - L138 generalized "reserve-to-commit ratio across all four surfaces" claim scoped: voice has a true reserve-to-commit ratio; the other three use cap-fire rate vs shadow baseline as the analogue. - L152 ">85% intended denials" labeled as a minimum triage bar with explicit note that sensitive surfaces (merge, voice mid-conversation) target higher fractions. - L187 "Reserve-to-actual ratio per surface" rewritten to "Voice reserve-to-commit ratio, trending; for the other three surfaces, cap-fire rates vs the shadow-mode baseline." Fixes both the terminology drift (capital-R variant the replace_all missed) and the cross-surface ratio generalization. Codex verified all per-surface gate primitives match the sibling PRs #648-#652 and confirmed the SEO, code-accuracy, and tone dimensions clean.

Moved from 2026-05-21 to 2026-06-13 to land one week after the voice post in the weekly publishing cadence. Also adjusted the opening time-framing language to match the new arc duration: "The last week of posts" → "The recent run of posts," and "A week later" → "A month on." With the four pillars spanning 5/16 through 6/06, the synthesis publishing on 6/13 sits roughly a month after the first pillar, not a week.

amavashev added 3 commits May 15, 2026 16:02

amavashev mentioned this pull request May 15, 2026

blog: Rolling Out Action Authority on New Surfaces #653

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blog: What Four New Surfaces Taught Us#652

blog: What Four New Surfaces Taught Us#652
amavashev wants to merge 4 commits into
mainfrom
blog/what-four-new-surfaces-taught-us

amavashev commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amavashev commented May 15, 2026

Summary

Reviews

Notable changes through review

Per-dimension scores

Test plan

Dependencies

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant