feat(sdlc): auto-promote policy_decide on a clean shadow-week (kill the 3b cutover cliff) by ryanklee · Pull Request #3829 · hapax-systems/hapax-council

ryanklee · 2026-06-01T14:04:04Z

What

Phase 3b shipped the shadow producer (replay_decision_log) and the cutover evaluator (evaluate_shadow_clean), but making policy_decide authoritative stayed a manual cliff gated on "shadow-week clean by 2026-06-07" — nothing acted on a YES. A clean predicate that never promotes itself is the same freeze-blocks-thaw bug one layer up.

This adds the missing actuator — a reversible, version-stamped promotion ladder:

shadow ──clean──▶ canary ──clean ≥24h──▶ authoritative   (──not-clean──▶ shadow)

How

decide_promotion — a pure transition (time + verdict injected): advances one rung per clean tick, requires a 24h canary dwell before canary→authoritative, and rolls straight back to shadow on any not-clean verdict or any POLICY_DECIDE_FN_VERSION change (permanent-canary discipline: new logic re-proves from scratch). Never raises, never skips a rung, never hard-cuts to authoritative.
run_promotion_cycle — the on-disk driver: evaluate → decide → persist posture (~/.cache/hapax/policy-decide-promotion.json) → append audit ledger (.jsonl). Best-effort, never raises.
policy-decide-promote.{service,timer} — run it hourly via python -m shared.policy_decide promote --replay. Advisory only: it advances a recorded posture, never the live gate verdict (master design §4.1 — log both decisions before becoming the verdict).

Acceptance criteria

Shadow producer accrues evidence into policy-decide-shadow.jsonl on its timer — verified green at runtime (325 rows @08:41 CDT, replay timer active; the "empty ledger" noted in the task was a post-reboot transient the timer already resolved).
Auto-promotion path advances policy_decide toward authoritative (canary/dual-log first) when the shadow-week predicate passes — no manual 2026-06-07 cliff.
Promotion is reversible + version-stamped; tests cover clean vs divergent.
Ruff + tests pass.

Test evidence

ruff check + ruff format --check: clean
pyright shared/policy_decide.py tests/test_policy_decide_promotion.py: 0 errors
pytest tests/test_policy_decide_promotion.py: 14 passed (10 pure ladder + 4 on-disk driver)
regression pytest test_policy_decide{,_shadow_producer,_shadow_scripts} test_policy_floor: 198 passed
CLI end-to-end: python -m shared.policy_decide promote → exit 0, posture shadow→canary persisted + audit row written

Task: reform-improve-shadow-autopromote-20260601 · AuthorityCase: CASE-FORMAL-GOVERNANCE-001 · parent spec: coordination-reform-master-design-2026-05-30 §4.1

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added automated policy promotion state machine with multi-stage advancement (shadow → canary → authoritative), anomaly-based rollback, and persistent audit trails
- Introduced hourly scheduled promotion cycles with configurable dwell windows before stage advancement
- New CLI command for on-demand promotion cycle execution
Tests
- Added comprehensive test coverage for promotion state transitions, rollback behavior, and cycle execution

…he 3b cutover cliff) Phase 3b shipped the shadow PRODUCER (replay_decision_log) and the cutover EVALUATOR (evaluate_shadow_clean), but making policy_decide authoritative stayed a MANUAL cliff gated on 'shadow-week clean by 2026-06-07' — nothing acted on a YES. A clean predicate that never promotes itself is the same freeze-blocks-thaw bug one layer up. Add the missing actuator: a reversible, version-stamped promotion ladder (shadow → canary → authoritative) advanced one rung per clean tick and rolled straight back to shadow on any not-clean verdict or any POLICY_DECIDE_FN_VERSION change (permanent-canary discipline). decide_promotion is a pure transition (time + verdict injected); run_promotion_cycle wires it to an on-disk posture + audit ledger. New policy-decide-promote.{service,timer} run it hourly via 'python -m shared.policy_decide promote --replay'. Advisory only: it advances a recorded posture, never the live gate verdict. Tests cover the pure ladder (clean advances; divergent/version-change roll back; 24h dwell) and the on-disk driver (clean→canary persists+audits; divergent stays; dwell→authoritative across two ticks; short window never promotes). Producer liveness (AC#1) already green: policy-decide-shadow.jsonl accruing on the replay timer. ruff + pyright clean; 14 promotion + 198 sibling tests pass. Task: reform-improve-shadow-autopromote-20260601 AuthorityCase: CASE-FORMAL-GOVERNANCE-001 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-01T14:04:24Z

📝 Walkthrough

Walkthrough

This PR introduces an auto-promotion state machine for the 3b cutover: a reversible posture ladder (shadow → canary → authoritative) driven by clean verdicts, canary dwell timing, divergence rollback, and version-change resets. The ladder persists to disk, is orchestrated by an hourly systemd timer, and is thoroughly tested with both pure transition and on-disk driver coverage.

Changes

Auto-promotion state machine for 3b cutover

Layer / File(s)	Summary
Promotion state machine core `shared/policy_decide.py`	`decide_promotion` implements the reversible posture ladder: clean verdicts advance shadow→canary→authoritative, divergence rolls back to shadow, canary dwell window enforces timing, and policy version changes force permanent reset to shadow.
Persistence and orchestration `shared/policy_decide.py`	JSON state and JSONL ledger persistence; `run_promotion_cycle` orchestrator optionally replays decision log, evaluates `evaluate_shadow_clean`, computes next promotion decision, persists posture, and conditionally audits transitions.
CLI entrypoint and dispatch `shared/policy_decide.py`	`sys` import and updated `__main__` routing: `promote` subcommand invokes `promote_main`, all other paths call existing advisory `main()`.
Systemd scheduling infrastructure `systemd/units/policy-decide-promote.service`, `systemd/units/policy-decide-promote.timer`	Service executes `python -m shared.policy_decide promote --replay`; timer schedules hourly after boot delay with randomized jitter and persistent restart behavior.
Test suite for state machine and driver `tests/test_policy_decide_promotion.py`	Pure transition tests (`TestDecidePromotion`) validate all posture changes, rollbacks, dwell windows, and version resets; on-disk driver tests (`TestRunPromotionCycle`) assert state persistence, ledger auditing, dwell-window promotion, and insufficient-evidence prevention.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🐰 A ladder of states so clean and bright,
From shadow to canary, reaching new height,
With dwell windows timed and divergence in flight,
Promotion flows hourly, version by version done right! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 38.24% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The description includes What, How, Acceptance criteria, and Test evidence sections with concrete details, but the required AuthorityCase template fields (Case/Slice) and CLAUDE.md hygiene checklist are missing or incomplete.	Ensure the AuthorityCase fields (Case/Slice) are properly filled and confirm all CLAUDE.md hygiene checklist items are addressed per the template requirements.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding automatic promotion for policy_decide to eliminate a manual cutover cliff.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch delta/reform-improve-shadow-autopromote-20260601

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/test_policy_decide_promotion.py (1)

291-293: 💤 Low value

Optionally assert the second transition was audited.

This test verifies the canary→authoritative state but not that the transition wrote an audit ledger row. Since run_promotion_cycle appends to the ledger only on an actual transition, asserting the ledger now has two rows (canary, then authoritative) would close the gap on the audit contract for this key promotion path.

♻️ Optional: assert audit rows after dwell promotion

             assert second["from_state"] == PROMOTION_CANARY
             assert second["to_state"] == PROMOTION_AUTHORITATIVE
             assert load_promotion_state(state).state == PROMOTION_AUTHORITATIVE
+            rows = [json.loads(line) for line in ledger.read_text().splitlines() if line]
+            assert [r["to_state"] for r in rows] == [PROMOTION_CANARY, PROMOTION_AUTHORITATIVE]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_policy_decide_promotion.py` around lines 291 - 293, Add an
assertion that the promotion transition was audited by checking the ledger
contains two rows for this key after run_promotion_cycle; specifically, after
asserting second["from_state"] == PROMOTION_CANARY and second["to_state"] ==
PROMOTION_AUTHORITATIVE and load_promotion_state(state).state ==
PROMOTION_AUTHORITATIVE, fetch the ledger rows related to this promotion (the
same ledger used by run_promotion_cycle) and assert its length/count equals 2
(or assert the second ledger entry matches the authoritative transition), so the
test verifies both the state change and that an audit row was appended.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_policy_decide_promotion.py`:
- Around line 291-293: Add an assertion that the promotion transition was
audited by checking the ledger contains two rows for this key after
run_promotion_cycle; specifically, after asserting second["from_state"] ==
PROMOTION_CANARY and second["to_state"] == PROMOTION_AUTHORITATIVE and
load_promotion_state(state).state == PROMOTION_AUTHORITATIVE, fetch the ledger
rows related to this promotion (the same ledger used by run_promotion_cycle) and
assert its length/count equals 2 (or assert the second ledger entry matches the
authoritative transition), so the test verifies both the state change and that
an audit row was appended.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b2bbcecd-4bbf-4ebe-8863-310c93fe7dab

📥 Commits

Reviewing files that changed from the base of the PR and between c1fd66b and 6d3151f.

📒 Files selected for processing (4)

shared/policy_decide.py
systemd/units/policy-decide-promote.service
systemd/units/policy-decide-promote.timer
tests/test_policy_decide_promotion.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6d3151fbc3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T14:11:48Z

+    # Permanent-canary discipline: a new policy_decide version must re-prove from shadow.
+    if current.policy_version != policy_version:
+        return decision(
+            PROMOTION_SHADOW,
+            True,
+            f"policy_version {current.policy_version}→{policy_version}: "
+            "re-entering shadow (permanent-canary discipline)",
+        )


Require fresh evidence after policy version reset

When POLICY_DECIDE_FN_VERSION changes, this branch only rewrites the posture to shadow with the new version; it does not invalidate or checkpoint the shadow evidence that produced the current clean verdict. If the decision log already contains an old clean 7-day span, the next hourly run_promotion_cycle will reuse that same historical verdict and promote shadow → canary without collecting a fresh shadow week for the new logic, so the advertised permanent-canary discipline is bypassed after any policy_decide change.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T14:11:48Z

+        if dwell >= canary_window_seconds:
+            return decision(
+                PROMOTION_AUTHORITATIVE,
+                True,
+                f"canary clean ≥{canary_window_seconds / 3600:.0f}h → authoritative-ready",


Require canary-window evidence before authoritative

After a clean shadow week enters canary, this branch promotes solely because wall-clock dwell elapsed while the latest verdict remains clean. Since that verdict is computed from the full historical decision log rather than decisions observed since entering canary, a quiet 24-hour period with no gated calls still advances to authoritative-ready, so the dual-decision canary can complete without actually proving any canary-window behavior.

Useful? React with 👍 / 👎.

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

ryanklee added this pull request to the merge queue Jun 1, 2026

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

Merged via the queue into main with commit 756ee35 Jun 1, 2026
35 of 36 checks passed

ryanklee deleted the delta/reform-improve-shadow-autopromote-20260601 branch June 1, 2026 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sdlc): auto-promote policy_decide on a clean shadow-week (kill the 3b cutover cliff)#3829

feat(sdlc): auto-promote policy_decide on a clean shadow-week (kill the 3b cutover cliff)#3829
ryanklee merged 1 commit into
mainfrom
delta/reform-improve-shadow-autopromote-20260601

ryanklee commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ryanklee commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Acceptance criteria

Test evidence

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ryanklee commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading