Skip to content

feat(sdlc): auto-promote policy_decide on a clean shadow-week (kill the 3b cutover cliff)#3829

Merged
ryanklee merged 1 commit into
mainfrom
delta/reform-improve-shadow-autopromote-20260601
Jun 1, 2026
Merged

feat(sdlc): auto-promote policy_decide on a clean shadow-week (kill the 3b cutover cliff)#3829
ryanklee merged 1 commit into
mainfrom
delta/reform-improve-shadow-autopromote-20260601

Conversation

@ryanklee
Copy link
Copy Markdown
Collaborator

@ryanklee ryanklee commented Jun 1, 2026

What

Phase 3b shipped the shadow producer (replay_decision_log) and the cutover evaluator (evaluate_shadow_clean), but making policy_decide authoritative stayed a manual cliff gated on "shadow-week clean by 2026-06-07" — nothing acted on a YES. A clean predicate that never promotes itself is the same freeze-blocks-thaw bug one layer up.

This adds the missing actuator — a reversible, version-stamped promotion ladder:

shadow ──clean──▶ canary ──clean ≥24h──▶ authoritative   (──not-clean──▶ shadow)

How

  • decide_promotion — a pure transition (time + verdict injected): advances one rung per clean tick, requires a 24h canary dwell before canary→authoritative, and rolls straight back to shadow on any not-clean verdict or any POLICY_DECIDE_FN_VERSION change (permanent-canary discipline: new logic re-proves from scratch). Never raises, never skips a rung, never hard-cuts to authoritative.
  • run_promotion_cycle — the on-disk driver: evaluate → decide → persist posture (~/.cache/hapax/policy-decide-promotion.json) → append audit ledger (.jsonl). Best-effort, never raises.
  • policy-decide-promote.{service,timer} — run it hourly via python -m shared.policy_decide promote --replay. Advisory only: it advances a recorded posture, never the live gate verdict (master design §4.1 — log both decisions before becoming the verdict).

Acceptance criteria

  • Shadow producer accrues evidence into policy-decide-shadow.jsonl on its timer — verified green at runtime (325 rows @08:41 CDT, replay timer active; the "empty ledger" noted in the task was a post-reboot transient the timer already resolved).
  • Auto-promotion path advances policy_decide toward authoritative (canary/dual-log first) when the shadow-week predicate passes — no manual 2026-06-07 cliff.
  • Promotion is reversible + version-stamped; tests cover clean vs divergent.
  • Ruff + tests pass.

Test evidence

  • ruff check + ruff format --check: clean
  • pyright shared/policy_decide.py tests/test_policy_decide_promotion.py: 0 errors
  • pytest tests/test_policy_decide_promotion.py: 14 passed (10 pure ladder + 4 on-disk driver)
  • regression pytest test_policy_decide{,_shadow_producer,_shadow_scripts} test_policy_floor: 198 passed
  • CLI end-to-end: python -m shared.policy_decide promote → exit 0, posture shadow→canary persisted + audit row written

Task: reform-improve-shadow-autopromote-20260601 · AuthorityCase: CASE-FORMAL-GOVERNANCE-001 · parent spec: coordination-reform-master-design-2026-05-30 §4.1

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added automated policy promotion state machine with multi-stage advancement (shadow → canary → authoritative), anomaly-based rollback, and persistent audit trails
    • Introduced hourly scheduled promotion cycles with configurable dwell windows before stage advancement
    • New CLI command for on-demand promotion cycle execution
  • Tests

    • Added comprehensive test coverage for promotion state transitions, rollback behavior, and cycle execution

…he 3b cutover cliff)

Phase 3b shipped the shadow PRODUCER (replay_decision_log) and the cutover EVALUATOR (evaluate_shadow_clean), but making policy_decide authoritative stayed a MANUAL cliff gated on 'shadow-week clean by 2026-06-07' — nothing acted on a YES. A clean predicate that never promotes itself is the same freeze-blocks-thaw bug one layer up.

Add the missing actuator: a reversible, version-stamped promotion ladder (shadow → canary → authoritative) advanced one rung per clean tick and rolled straight back to shadow on any not-clean verdict or any POLICY_DECIDE_FN_VERSION change (permanent-canary discipline). decide_promotion is a pure transition (time + verdict injected); run_promotion_cycle wires it to an on-disk posture + audit ledger. New policy-decide-promote.{service,timer} run it hourly via 'python -m shared.policy_decide promote --replay'. Advisory only: it advances a recorded posture, never the live gate verdict.

Tests cover the pure ladder (clean advances; divergent/version-change roll back; 24h dwell) and the on-disk driver (clean→canary persists+audits; divergent stays; dwell→authoritative across two ticks; short window never promotes). Producer liveness (AC#1) already green: policy-decide-shadow.jsonl accruing on the replay timer. ruff + pyright clean; 14 promotion + 198 sibling tests pass.

Task: reform-improve-shadow-autopromote-20260601

AuthorityCase: CASE-FORMAL-GOVERNANCE-001

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces an auto-promotion state machine for the 3b cutover: a reversible posture ladder (shadow → canary → authoritative) driven by clean verdicts, canary dwell timing, divergence rollback, and version-change resets. The ladder persists to disk, is orchestrated by an hourly systemd timer, and is thoroughly tested with both pure transition and on-disk driver coverage.

Changes

Auto-promotion state machine for 3b cutover

Layer / File(s) Summary
Promotion state machine core
shared/policy_decide.py
decide_promotion implements the reversible posture ladder: clean verdicts advance shadow→canary→authoritative, divergence rolls back to shadow, canary dwell window enforces timing, and policy version changes force permanent reset to shadow.
Persistence and orchestration
shared/policy_decide.py
JSON state and JSONL ledger persistence; run_promotion_cycle orchestrator optionally replays decision log, evaluates evaluate_shadow_clean, computes next promotion decision, persists posture, and conditionally audits transitions.
CLI entrypoint and dispatch
shared/policy_decide.py
sys import and updated __main__ routing: promote subcommand invokes promote_main, all other paths call existing advisory main().
Systemd scheduling infrastructure
systemd/units/policy-decide-promote.service, systemd/units/policy-decide-promote.timer
Service executes python -m shared.policy_decide promote --replay; timer schedules hourly after boot delay with randomized jitter and persistent restart behavior.
Test suite for state machine and driver
tests/test_policy_decide_promotion.py
Pure transition tests (TestDecidePromotion) validate all posture changes, rollbacks, dwell windows, and version resets; on-disk driver tests (TestRunPromotionCycle) assert state persistence, ledger auditing, dwell-window promotion, and insufficient-evidence prevention.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🐰 A ladder of states so clean and bright,
From shadow to canary, reaching new height,
With dwell windows timed and divergence in flight,
Promotion flows hourly, version by version done right!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.24% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description includes What, How, Acceptance criteria, and Test evidence sections with concrete details, but the required AuthorityCase template fields (Case/Slice) and CLAUDE.md hygiene checklist are missing or incomplete. Ensure the AuthorityCase fields (Case/Slice) are properly filled and confirm all CLAUDE.md hygiene checklist items are addressed per the template requirements.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding automatic promotion for policy_decide to eliminate a manual cutover cliff.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch delta/reform-improve-shadow-autopromote-20260601

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_policy_decide_promotion.py (1)

291-293: 💤 Low value

Optionally assert the second transition was audited.

This test verifies the canary→authoritative state but not that the transition wrote an audit ledger row. Since run_promotion_cycle appends to the ledger only on an actual transition, asserting the ledger now has two rows (canary, then authoritative) would close the gap on the audit contract for this key promotion path.

♻️ Optional: assert audit rows after dwell promotion
             assert second["from_state"] == PROMOTION_CANARY
             assert second["to_state"] == PROMOTION_AUTHORITATIVE
             assert load_promotion_state(state).state == PROMOTION_AUTHORITATIVE
+            rows = [json.loads(line) for line in ledger.read_text().splitlines() if line]
+            assert [r["to_state"] for r in rows] == [PROMOTION_CANARY, PROMOTION_AUTHORITATIVE]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_policy_decide_promotion.py` around lines 291 - 293, Add an
assertion that the promotion transition was audited by checking the ledger
contains two rows for this key after run_promotion_cycle; specifically, after
asserting second["from_state"] == PROMOTION_CANARY and second["to_state"] ==
PROMOTION_AUTHORITATIVE and load_promotion_state(state).state ==
PROMOTION_AUTHORITATIVE, fetch the ledger rows related to this promotion (the
same ledger used by run_promotion_cycle) and assert its length/count equals 2
(or assert the second ledger entry matches the authoritative transition), so the
test verifies both the state change and that an audit row was appended.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_policy_decide_promotion.py`:
- Around line 291-293: Add an assertion that the promotion transition was
audited by checking the ledger contains two rows for this key after
run_promotion_cycle; specifically, after asserting second["from_state"] ==
PROMOTION_CANARY and second["to_state"] == PROMOTION_AUTHORITATIVE and
load_promotion_state(state).state == PROMOTION_AUTHORITATIVE, fetch the ledger
rows related to this promotion (the same ledger used by run_promotion_cycle) and
assert its length/count equals 2 (or assert the second ledger entry matches the
authoritative transition), so the test verifies both the state change and that
an audit row was appended.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b2bbcecd-4bbf-4ebe-8863-310c93fe7dab

📥 Commits

Reviewing files that changed from the base of the PR and between c1fd66b and 6d3151f.

📒 Files selected for processing (4)
  • shared/policy_decide.py
  • systemd/units/policy-decide-promote.service
  • systemd/units/policy-decide-promote.timer
  • tests/test_policy_decide_promotion.py

@ryanklee ryanklee added this pull request to the merge queue Jun 1, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6d3151fbc3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread shared/policy_decide.py
Comment on lines +956 to +963
# Permanent-canary discipline: a new policy_decide version must re-prove from shadow.
if current.policy_version != policy_version:
return decision(
PROMOTION_SHADOW,
True,
f"policy_version {current.policy_version}→{policy_version}: "
"re-entering shadow (permanent-canary discipline)",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require fresh evidence after policy version reset

When POLICY_DECIDE_FN_VERSION changes, this branch only rewrites the posture to shadow with the new version; it does not invalidate or checkpoint the shadow evidence that produced the current clean verdict. If the decision log already contains an old clean 7-day span, the next hourly run_promotion_cycle will reuse that same historical verdict and promote shadow → canary without collecting a fresh shadow week for the new logic, so the advertised permanent-canary discipline is bypassed after any policy_decide change.

Useful? React with 👍 / 👎.

Comment thread shared/policy_decide.py
Comment on lines +979 to +983
if dwell >= canary_window_seconds:
return decision(
PROMOTION_AUTHORITATIVE,
True,
f"canary clean ≥{canary_window_seconds / 3600:.0f}h → authoritative-ready",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require canary-window evidence before authoritative

After a clean shadow week enters canary, this branch promotes solely because wall-clock dwell elapsed while the latest verdict remains clean. Since that verdict is computed from the full historical decision log rather than decisions observed since entering canary, a quiet 24-hour period with no gated calls still advances to authoritative-ready, so the dual-decision canary can complete without actually proving any canary-window behavior.

Useful? React with 👍 / 👎.

Merged via the queue into main with commit 756ee35 Jun 1, 2026
35 of 36 checks passed
@ryanklee ryanklee deleted the delta/reform-improve-shadow-autopromote-20260601 branch June 1, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant