blog: Rolling Out Action Authority on New Surfaces#653
Open
amavashev wants to merge 3 commits into
Open
Conversation
Operational playbook post that extends the existing shadow-to-
enforcement-cutover-decision-tree per-surface for the action-authority
extensions. Practical counterweight to the five theory posts shipped
this session.
Structure:
- Week 1: Surface inventory (template, sanity check)
- Week 2: Shadow-mode instrumentation per surface (memory / merge /
click / voice instrumentation table)
- Week 3: Per-surface gate primitives + calibration (per-surface
primitive tables matching the sibling posts' caps and patterns)
- Week 4: Cutover, lowest cost-of-false-positive-denial first
(memory → click → voice → merge)
- Rollback decision tree (per-surface thresholds with explicit
heuristic labeling)
- Monitoring (first 72h / first month / ongoing)
- Runbook template (7 entries per surface)
- What it is not / What it is
The per-surface gate primitives are 15-of-16 verbatim or near-verbatim
matches with the sibling posts' actual caps (per consistency reviewer):
- Memory: per-tenant write quota, TTL on unverified facts, per-write
provenance, scope isolation
- Merge: distinct-approver, requires_human_approval, per-session
promotion budget, deploy-gate cap
- Click: requires_fresh_screenshot, cross-tenant navigation deny,
target-intent risk schedule, session budget denominated in risk
- Voice: predictive reservation, wall-clock cap, tier-aware gating,
per-turn-boundary re-check
Reviews: internal cycles 1-3 (scorecard 9.4/10), glossary linker
applied 7 contextual links. Cycle 1 caught and fixed:
- Terminology drift "reserve-to-actual ratio" -> "reserve-to-commit
ratio" (canonical corpus term)
- Tag "rollout" -> "adoption" to match cutover-decision-tree precedent
- Click "wall-clock cap" reworded to match sibling's actual framing
("session budget denominated in risk, not count")
- Cutover-order metric clarified as "cost of false-positive denial,"
not action blast radius
- Rollback R/A row scoped to voice-specific
- 20%/40%/30% concentration thresholds labeled as starting heuristics
- Week 3 calibration window made explicit ("at least a week...
calibrating four surfaces in parallel typically requires extending
Week 3 to 2-3 weeks")
- Two rhetorical closers cut; "What Action Authority Adoption Is"
section tightened from anaphoric essay to operational summary
This post depends on the five sibling PRs being merged first
(#648-#652) so its many cross-links resolve.
…ew-surfaces Apply/skip tally: 8 applied, 0 pushed back. Applied: - L36 synthesis quote: replaced "the lifecycle is the stable layer" (which is not the exact synthesis H2 wording) with prose paraphrase that aligns with the actual H2 "Reserve-Commit Is the Stable Layer." - L45 / L140 / L225 "risk order" / "lowest-risk" framing aligned with L142 clarification: now "false-positive-cost order" / "lowest-false-positive-cost" throughout, matching how the cutover order is actually ranked. - L103 absolute "the quota is wrong / not constraining anything" softened to "Substantially higher rates suggest...; substantially lower rates suggest...". Calibration target labeled as starting heuristic. - L125 "Most shadow weeks produce a clean bimodal distribution" hedged: "When the shadow data produces a clearly bimodal distribution, the cap belongs in the gap; when it does not, the schedule needs more (target, intent) features." - L138 generalized "reserve-to-commit ratio across all four surfaces" claim scoped: voice has a true reserve-to-commit ratio; the other three use cap-fire rate vs shadow baseline as the analogue. - L152 ">85% intended denials" labeled as a minimum triage bar with explicit note that sensitive surfaces (merge, voice mid-conversation) target higher fractions. - L187 "Reserve-to-actual ratio per surface" rewritten to "Voice reserve-to-commit ratio, trending; for the other three surfaces, cap-fire rates vs the shadow-mode baseline." Fixes both the terminology drift (capital-R variant the replace_all missed) and the cross-surface ratio generalization. Codex verified all per-surface gate primitives match the sibling PRs #648-#652 and confirmed the SEO, code-accuracy, and tone dimensions clean.
…-06-20 Moved from 2026-05-22 to 2026-06-20 to land one week after the synthesis post in the weekly publishing cadence. Final arc dates: - memory 5/16 - merge 5/23 - computer-use 5/30 - voice 6/06 - synthesis 6/13 - rollout playbook 6/20 Also adjusted "The last five posts" to "The recent five-post arc" since the series no longer publishes daily.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Operational counterweight to the five theory posts shipped this session. Extends the existing shadow-to-enforcement cutover decision tree per-surface for the action-authority extensions covered in PRs #648–#652.
Structure: four-week playbook + rollback tree
requires_human_approval, per-session promotion budget, deploy-gaterequires_fresh_screenshot, cross-tenant nav deny, target-intent risk schedule, risk-denominated session budgetAuthor: Albert Mavashev
Date: 2026-05-22
Word count: ~2,850 body
Reviews
Codex verified all per-surface gate primitives match the sibling PRs #648–#652 verbatim or near-verbatim (15 of 16 caps directly cited).
Notable changes through review
Cycle 1 caught and fixed:
rollout→adoption(matches cutover-decision-tree precedent)Codex round 1 caught additional:
Per-dimension scores
Overall: 9.4 / 10
Test plan
Dependencies
Depends on PRs #648–#652 being merged first. This post is a practical playbook layered on top of the five-post arc; it references all five siblings extensively in the intro, the Week-3 calibration sections, and Next Steps. Required merge order: #648 → #649 → #650 → #651 → #652 → this PR.