blog: Rolling Out Action Authority on New Surfaces by amavashev · Pull Request #653 · runcycles/cycles-docs

amavashev · 2026-05-15T21:56:50Z

Summary

Operational counterweight to the five theory posts shipped this session. Extends the existing shadow-to-enforcement cutover decision tree per-surface for the action-authority extensions covered in PRs #648–#652.

Structure: four-week playbook + rollback tree

Week 1: Inventory — surface-by-surface template with current gate, blast radius tier, reversibility, frequency, existing audit
Week 2: Shadow-mode instrumentation — per-surface dry-run table (memory / merge / click / voice)
Week 3: Per-surface gate primitives + calibration — explicit caps for each surface, drawn verbatim from the sibling posts:
- Memory: per-tenant write quota, TTL on unverified facts, per-write provenance, scope isolation
- Merge: distinct-approver, requires_human_approval, per-session promotion budget, deploy-gate
- Click: requires_fresh_screenshot, cross-tenant nav deny, target-intent risk schedule, risk-denominated session budget
- Voice: predictive reservation, wall-clock cap, tier-aware gating, per-turn-boundary re-check
Week 4: Cutover, lowest-false-positive-cost first — memory → click → voice → merge
Rollback decision tree — denial-rate thresholds (>2× shadow rate for 1h flips to shadow; >5× for 15min kills the gate), with concentration-heuristic-labeled per-tenant and per-agent rollback signals
Monitoring (first 72h / first month / ongoing)
Short runbook template — 7 entries per surface

Author: Albert Mavashev
Date: 2026-05-22
Word count: ~2,850 body

Reviews

Internal cycles 1–3 (scorecard 9.4/10)
Glossary auto-linker applied 7 contextual links
Codex external review: round 1 REVISE-MINOR (8 findings, all applied / 0 pushed back), round 2 SHIP

Codex verified all per-surface gate primitives match the sibling PRs #648–#652 verbatim or near-verbatim (15 of 16 caps directly cited).

Notable changes through review

Cycle 1 caught and fixed:

Terminology drift: "reserve-to-actual ratio" → "reserve-to-commit ratio" (canonical corpus term)
Tag: rollout → adoption (matches cutover-decision-tree precedent)
Click "wall-clock cap" reframed as "session budget denominated in risk, not count" (matches the click sibling's actual framing)
Cutover-order metric clarified as cost-of-false-positive-denial, not action blast radius
R/A ratio row in rollback table scoped to voice-specific
20%/40%/30% concentration thresholds labeled as starting heuristics
Week 3 calibration window made explicit ("at least a week"; parallel-surface adoption requires extending Week 3 to 2–3 weeks)
Two rhetorical closers cut
"What Action Authority Adoption Is" section tightened from anaphoric essay to operational summary

Codex round 1 caught additional:

L36 synthesis quote "the lifecycle is the stable layer" — the actual H2 in PR blog: What Four New Surfaces Taught Us #652 is "Reserve-Commit Is the Stable Layer." Replaced with prose paraphrase.
"Risk order" / "lowest-risk" framing throughout aligned with "false-positive-cost order" / "lowest-false-positive-cost" (the actual ranking metric).
L103 calibration band absolute "the quota is wrong / not constraining anything" softened to starting heuristic framing.
L125 "Most shadow weeks produce a clean bimodal distribution" hedged.
L138 cross-surface "reserve-to-commit ratio" claim scoped: voice has a true R/C ratio; other three surfaces use cap-fire rate vs shadow baseline as the analogue.
L152 ">85% intended denials" labeled as minimum triage bar; sensitive surfaces target higher fractions.
L187 "Reserve-to-actual ratio per surface" rewritten to scope voice R/C separately from the other three surfaces' cap-fire baselines.

Per-dimension scores

Dimension	Score
Factual accuracy	9.5
Credibility	9
Cross-links	9.5
SEO (title 44/51, desc 155/160)	9.5
Code accuracy	10
Structure & flow	9.5
Terminology	9.5
Tone & style	9.5

Overall: 9.4 / 10

Test plan

`npm run dev` and verify post renders at `/blog/rolling-out-action-authority-on-new-surfaces`
Verify post appears on `/blog/` index sorted to top (date 2026-05-22)
Click through all internal links — five sibling links depend on PRs blog: Agent Memory Writes Are Actions, Too #648–blog: What Four New Surfaces Taught Us #652 being merged first
Confirm date/author/tags/reading-time header renders above body
Confirm Prev/Next post navigation works
`npm run build` succeeds with no broken-link warnings

Dependencies

Depends on PRs #648–#652 being merged first. This post is a practical playbook layered on top of the five-post arc; it references all five siblings extensively in the intro, the Week-3 calibration sections, and Next Steps. Required merge order: #648 → #649 → #650 → #651 → #652 → this PR.

Operational playbook post that extends the existing shadow-to- enforcement-cutover-decision-tree per-surface for the action-authority extensions. Practical counterweight to the five theory posts shipped this session. Structure: - Week 1: Surface inventory (template, sanity check) - Week 2: Shadow-mode instrumentation per surface (memory / merge / click / voice instrumentation table) - Week 3: Per-surface gate primitives + calibration (per-surface primitive tables matching the sibling posts' caps and patterns) - Week 4: Cutover, lowest cost-of-false-positive-denial first (memory → click → voice → merge) - Rollback decision tree (per-surface thresholds with explicit heuristic labeling) - Monitoring (first 72h / first month / ongoing) - Runbook template (7 entries per surface) - What it is not / What it is The per-surface gate primitives are 15-of-16 verbatim or near-verbatim matches with the sibling posts' actual caps (per consistency reviewer): - Memory: per-tenant write quota, TTL on unverified facts, per-write provenance, scope isolation - Merge: distinct-approver, requires_human_approval, per-session promotion budget, deploy-gate cap - Click: requires_fresh_screenshot, cross-tenant navigation deny, target-intent risk schedule, session budget denominated in risk - Voice: predictive reservation, wall-clock cap, tier-aware gating, per-turn-boundary re-check Reviews: internal cycles 1-3 (scorecard 9.4/10), glossary linker applied 7 contextual links. Cycle 1 caught and fixed: - Terminology drift "reserve-to-actual ratio" -> "reserve-to-commit ratio" (canonical corpus term) - Tag "rollout" -> "adoption" to match cutover-decision-tree precedent - Click "wall-clock cap" reworded to match sibling's actual framing ("session budget denominated in risk, not count") - Cutover-order metric clarified as "cost of false-positive denial," not action blast radius - Rollback R/A row scoped to voice-specific - 20%/40%/30% concentration thresholds labeled as starting heuristics - Week 3 calibration window made explicit ("at least a week... calibrating four surfaces in parallel typically requires extending Week 3 to 2-3 weeks") - Two rhetorical closers cut; "What Action Authority Adoption Is" section tightened from anaphoric essay to operational summary This post depends on the five sibling PRs being merged first (#648-#652) so its many cross-links resolve.

…ew-surfaces Apply/skip tally: 8 applied, 0 pushed back. Applied: - L36 synthesis quote: replaced "the lifecycle is the stable layer" (which is not the exact synthesis H2 wording) with prose paraphrase that aligns with the actual H2 "Reserve-Commit Is the Stable Layer." - L45 / L140 / L225 "risk order" / "lowest-risk" framing aligned with L142 clarification: now "false-positive-cost order" / "lowest-false-positive-cost" throughout, matching how the cutover order is actually ranked. - L103 absolute "the quota is wrong / not constraining anything" softened to "Substantially higher rates suggest...; substantially lower rates suggest...". Calibration target labeled as starting heuristic. - L125 "Most shadow weeks produce a clean bimodal distribution" hedged: "When the shadow data produces a clearly bimodal distribution, the cap belongs in the gap; when it does not, the schedule needs more (target, intent) features." - L138 generalized "reserve-to-commit ratio across all four surfaces" claim scoped: voice has a true reserve-to-commit ratio; the other three use cap-fire rate vs shadow baseline as the analogue. - L152 ">85% intended denials" labeled as a minimum triage bar with explicit note that sensitive surfaces (merge, voice mid-conversation) target higher fractions. - L187 "Reserve-to-actual ratio per surface" rewritten to "Voice reserve-to-commit ratio, trending; for the other three surfaces, cap-fire rates vs the shadow-mode baseline." Fixes both the terminology drift (capital-R variant the replace_all missed) and the cross-surface ratio generalization. Codex verified all per-surface gate primitives match the sibling PRs #648-#652 and confirmed the SEO, code-accuracy, and tone dimensions clean.

…-06-20 Moved from 2026-05-22 to 2026-06-20 to land one week after the synthesis post in the weekly publishing cadence. Final arc dates: - memory 5/16 - merge 5/23 - computer-use 5/30 - voice 6/06 - synthesis 6/13 - rollout playbook 6/20 Also adjusted "The last five posts" to "The recent five-post arc" since the series no longer publishes daily.

amavashev added 3 commits May 15, 2026 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blog: Rolling Out Action Authority on New Surfaces#653

blog: Rolling Out Action Authority on New Surfaces#653
amavashev wants to merge 3 commits into
mainfrom
blog/rolling-out-action-authority-on-new-surfaces

amavashev commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amavashev commented May 15, 2026

Summary

Reviews

Notable changes through review

Per-dimension scores

Test plan

Dependencies

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant