feat(sdlc): auto-promote policy_decide on a clean shadow-week (kill the 3b cutover cliff)#3829
Conversation
…he 3b cutover cliff)
Phase 3b shipped the shadow PRODUCER (replay_decision_log) and the cutover EVALUATOR (evaluate_shadow_clean), but making policy_decide authoritative stayed a MANUAL cliff gated on 'shadow-week clean by 2026-06-07' — nothing acted on a YES. A clean predicate that never promotes itself is the same freeze-blocks-thaw bug one layer up.
Add the missing actuator: a reversible, version-stamped promotion ladder (shadow → canary → authoritative) advanced one rung per clean tick and rolled straight back to shadow on any not-clean verdict or any POLICY_DECIDE_FN_VERSION change (permanent-canary discipline). decide_promotion is a pure transition (time + verdict injected); run_promotion_cycle wires it to an on-disk posture + audit ledger. New policy-decide-promote.{service,timer} run it hourly via 'python -m shared.policy_decide promote --replay'. Advisory only: it advances a recorded posture, never the live gate verdict.
Tests cover the pure ladder (clean advances; divergent/version-change roll back; 24h dwell) and the on-disk driver (clean→canary persists+audits; divergent stays; dwell→authoritative across two ticks; short window never promotes). Producer liveness (AC#1) already green: policy-decide-shadow.jsonl accruing on the replay timer. ruff + pyright clean; 14 promotion + 198 sibling tests pass.
Task: reform-improve-shadow-autopromote-20260601
AuthorityCase: CASE-FORMAL-GOVERNANCE-001
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR introduces an auto-promotion state machine for the 3b cutover: a reversible posture ladder (shadow → canary → authoritative) driven by clean verdicts, canary dwell timing, divergence rollback, and version-change resets. The ladder persists to disk, is orchestrated by an hourly systemd timer, and is thoroughly tested with both pure transition and on-disk driver coverage. ChangesAuto-promotion state machine for 3b cutover
🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_policy_decide_promotion.py (1)
291-293: 💤 Low valueOptionally assert the second transition was audited.
This test verifies the canary→authoritative state but not that the transition wrote an audit ledger row. Since
run_promotion_cycleappends to the ledger only on an actual transition, asserting the ledger now has two rows (canary, then authoritative) would close the gap on the audit contract for this key promotion path.♻️ Optional: assert audit rows after dwell promotion
assert second["from_state"] == PROMOTION_CANARY assert second["to_state"] == PROMOTION_AUTHORITATIVE assert load_promotion_state(state).state == PROMOTION_AUTHORITATIVE + rows = [json.loads(line) for line in ledger.read_text().splitlines() if line] + assert [r["to_state"] for r in rows] == [PROMOTION_CANARY, PROMOTION_AUTHORITATIVE]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_policy_decide_promotion.py` around lines 291 - 293, Add an assertion that the promotion transition was audited by checking the ledger contains two rows for this key after run_promotion_cycle; specifically, after asserting second["from_state"] == PROMOTION_CANARY and second["to_state"] == PROMOTION_AUTHORITATIVE and load_promotion_state(state).state == PROMOTION_AUTHORITATIVE, fetch the ledger rows related to this promotion (the same ledger used by run_promotion_cycle) and assert its length/count equals 2 (or assert the second ledger entry matches the authoritative transition), so the test verifies both the state change and that an audit row was appended.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_policy_decide_promotion.py`:
- Around line 291-293: Add an assertion that the promotion transition was
audited by checking the ledger contains two rows for this key after
run_promotion_cycle; specifically, after asserting second["from_state"] ==
PROMOTION_CANARY and second["to_state"] == PROMOTION_AUTHORITATIVE and
load_promotion_state(state).state == PROMOTION_AUTHORITATIVE, fetch the ledger
rows related to this promotion (the same ledger used by run_promotion_cycle) and
assert its length/count equals 2 (or assert the second ledger entry matches the
authoritative transition), so the test verifies both the state change and that
an audit row was appended.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b2bbcecd-4bbf-4ebe-8863-310c93fe7dab
📒 Files selected for processing (4)
shared/policy_decide.pysystemd/units/policy-decide-promote.servicesystemd/units/policy-decide-promote.timertests/test_policy_decide_promotion.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6d3151fbc3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Permanent-canary discipline: a new policy_decide version must re-prove from shadow. | ||
| if current.policy_version != policy_version: | ||
| return decision( | ||
| PROMOTION_SHADOW, | ||
| True, | ||
| f"policy_version {current.policy_version}→{policy_version}: " | ||
| "re-entering shadow (permanent-canary discipline)", | ||
| ) |
There was a problem hiding this comment.
Require fresh evidence after policy version reset
When POLICY_DECIDE_FN_VERSION changes, this branch only rewrites the posture to shadow with the new version; it does not invalidate or checkpoint the shadow evidence that produced the current clean verdict. If the decision log already contains an old clean 7-day span, the next hourly run_promotion_cycle will reuse that same historical verdict and promote shadow → canary without collecting a fresh shadow week for the new logic, so the advertised permanent-canary discipline is bypassed after any policy_decide change.
Useful? React with 👍 / 👎.
| if dwell >= canary_window_seconds: | ||
| return decision( | ||
| PROMOTION_AUTHORITATIVE, | ||
| True, | ||
| f"canary clean ≥{canary_window_seconds / 3600:.0f}h → authoritative-ready", |
There was a problem hiding this comment.
Require canary-window evidence before authoritative
After a clean shadow week enters canary, this branch promotes solely because wall-clock dwell elapsed while the latest verdict remains clean. Since that verdict is computed from the full historical decision log rather than decisions observed since entering canary, a quiet 24-hour period with no gated calls still advances to authoritative-ready, so the dual-decision canary can complete without actually proving any canary-window behavior.
Useful? React with 👍 / 👎.
What
Phase 3b shipped the shadow producer (
replay_decision_log) and the cutover evaluator (evaluate_shadow_clean), but makingpolicy_decideauthoritative stayed a manual cliff gated on "shadow-week clean by 2026-06-07" — nothing acted on a YES. A clean predicate that never promotes itself is the same freeze-blocks-thaw bug one layer up.This adds the missing actuator — a reversible, version-stamped promotion ladder:
How
decide_promotion— a pure transition (time + verdict injected): advances one rung per clean tick, requires a 24h canary dwell before canary→authoritative, and rolls straight back to shadow on any not-clean verdict or anyPOLICY_DECIDE_FN_VERSIONchange (permanent-canary discipline: new logic re-proves from scratch). Never raises, never skips a rung, never hard-cuts to authoritative.run_promotion_cycle— the on-disk driver: evaluate → decide → persist posture (~/.cache/hapax/policy-decide-promotion.json) → append audit ledger (.jsonl). Best-effort, never raises.policy-decide-promote.{service,timer}— run it hourly viapython -m shared.policy_decide promote --replay. Advisory only: it advances a recorded posture, never the live gate verdict (master design §4.1 — log both decisions before becoming the verdict).Acceptance criteria
policy-decide-shadow.jsonlon its timer — verified green at runtime (325 rows @08:41 CDT, replay timer active; the "empty ledger" noted in the task was a post-reboot transient the timer already resolved).policy_decidetoward authoritative (canary/dual-log first) when the shadow-week predicate passes — no manual 2026-06-07 cliff.Test evidence
ruff check+ruff format --check: cleanpyright shared/policy_decide.py tests/test_policy_decide_promotion.py: 0 errorspytest tests/test_policy_decide_promotion.py: 14 passed (10 pure ladder + 4 on-disk driver)pytest test_policy_decide{,_shadow_producer,_shadow_scripts} test_policy_floor: 198 passedpython -m shared.policy_decide promote→ exit 0, postureshadow→canarypersisted + audit row writtenTask:
reform-improve-shadow-autopromote-20260601· AuthorityCase:CASE-FORMAL-GOVERNANCE-001· parent spec: coordination-reform-master-design-2026-05-30 §4.1🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests