feat(sdlc): mechanical reform-complete acceptance predicate + regression detector by ryanklee · Pull Request #3833 · hapax-systems/hapax-council

ryanklee · 2026-06-01T14:41:22Z

What

scripts/hapax-reform-complete — a mechanical "is the coordination reform REALIZED?" predicate over the live host. Mirrors policy-decide-shadow-eval: exit 0 = every load-bearing realization is live, non-zero + a JSON verdict with reasons[] otherwise.

The meta-fix against a 3rd merged-vs-realized cycle (CASE-SDLC-REFORM-001): "merged" was conflated with "realized" at the PR layer once, then the expansive audit found the same pattern recur at the filesystem/activation layer — with no single checkable "reform-is-DONE" predicate analogous to the per-unit policy-decide-shadow-eval. This converts that judgment into a predicate.

Eight checks (live host)

check	realization verified
`coord-ssot-ledger`	daemon-owned coord ledger provisioned+writable+sole; `coord-drift-check` green
`opus-route-authority`	signed opus route-authority receipt present+unexpired; upkeep timer live
`lane-supervisor`	`hapax-lane-supervisor.timer` enabled+active (FM-11)
`canonical-gate`	INV-5 (`is_cognition_path`) in live+repo gate; live gate == repo source
`escape-grant`	daemon-independent escape wired; grant mint→verify round-trip (INV-3/4)
`coord-verbs`	`coord.request.refine` present; ≥1 non-fallback coord verb
`shadow-cutover`	reform 3b shadow-cutover predicate reachable
`off-deprecation`	`HAPAX_*_OFF` deprecated + retro-grant backstop; no zombie launcher pidfiles

Live calibration: 7/8. coord-verbs is honestly OPEN — all 49 coord verbs are dry-run stubs pending the daemon-owned ledger writer (manifest unit K). The predicate flips green when that lands. That is the point: it refuses to call the reform "done" while a realization is missing — and each check was tuned against live data to avoid false positives (e.g. advisory coord-drift-check drift and unwired per-worktree gate source copies are surfaced as detail, not failures).

Wiring

Terminal gate — reform-execution-manifest.yaml gains a completion_gate: block naming the predicate, mirroring 3b-cutover.shadow_eval. (Vault file; not in this diff.)
Regression detector — hapax-reform-complete.{service,timer} run --regression-only every 6h and notify on a silent revert (OnFailure=notify-failure@ + the script's own send_notification). A high-water mark keeps the detector quiet for not-yet-realized checks and fires only when a previously-passing realization reverts. # Hapax-Auto-Enable: true → goes live on merge.

Tests

A gather/decide split keeps the decision logic pure. 42 tests cover every decider, verdict aggregation, the watermark, and the CLI exit-code contract via --observations (deterministic, no live host / systemd needed). Ruff clean.

Task: reform-improve-acceptance-predicate-20260601

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- New health-check predicate system for monitoring reform completion status
- Systemd timer scheduling automated validation checks every 6 hours
- Regression detection mode to identify when previously-passing checks revert
- JSON output for programmatic result consumption
Tests
- Added comprehensive test suite for the reform-complete validation system

…ression detector Convert "is the coordination reform DONE?" from an eyeball judgment into a checkable predicate over the LIVE host — the meta-fix against a 3rd merged-vs-realized cycle (CASE-SDLC-REFORM-001). "Merged" was conflated with "realized" at the PR layer once; the expansive audit found the same pattern recur at the filesystem/activation layer with no single "reform-is-DONE" predicate analogous to the per-unit policy-decide-shadow-eval. scripts/hapax-reform-complete mirrors policy-decide-shadow-eval: exit 0 = every load-bearing realization is live, non-zero + a JSON verdict with reasons[] otherwise. Eight checks: coord SSOT ledger (provisioned+writable+sole + coord-drift-check green), opus route-authority receipt (present+unexpired + upkeep timer), hapax-lane-supervisor.timer, canonical gate (INV-5 in live+repo gate, live==repo source), escape grant (mint->verify round-trip, INV-3/4), coord.request.refine + >=1 non-fallback coord verb, 3b shadow-cutover predicate reachable, HAPAX_*_OFF deprecation + retro-grant backstop + no zombie launchers. Calibrated against the live host (7/8; coord-verbs OPEN — all coord verbs are dry-run stubs pending the daemon-owned ledger writer, so the predicate honestly reports the reform as not-yet-complete and flips green when that lands). Wired as the manifest terminal gate (reform-execution-manifest.yaml completion_gate, mirroring 3b-cutover.shadow_eval) + a periodic regression detector: systemd/units/hapax-reform-complete.{service,timer} run --regression-only every 6h and ntfy on a SILENT REVERT (OnFailure + send_notification). A high-water mark keeps the detector quiet for not-yet-realized checks and fires only when a previously-passing realization reverts. Auto-Enable marker so it goes live on merge. A gather/decide split keeps the decision logic pure; 42 tests cover every decider, verdict aggregation, the watermark, and the CLI exit-code contract via --observations (deterministic, no live host needed). Task: reform-improve-acceptance-predicate-20260601 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-01T14:41:42Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 04026471-fd88-4bc5-8fc2-ef994fe45fd9

📥 Commits

Reviewing files that changed from the base of the PR and between 3f6bcbc and 3e01807.

📒 Files selected for processing (4)

scripts/hapax-reform-complete
systemd/units/hapax-reform-complete.service
systemd/units/hapax-reform-complete.timer
tests/test_reform_complete.py

📝 Walkthrough

Walkthrough

The PR introduces a new coordination reform-complete predicate script that verifies deployment readiness by probing system components, converting observations into verdicts, and tracking regression via persistent watermarks. Includes systemd timer/service scheduling and comprehensive acceptance tests.

Changes

Reform-complete coordination predicate

Layer / File(s)	Summary
Script foundation and observation gathering `scripts/hapax-reform-complete` (setup & gatherers)	Script constants, check IDs, CheckResult dataclass, and low-level probe helpers; observation gatherers for ledger SSOT, opus route-authority, escape-grant wiring/roundtrip, coord verbs, shadow-cutover reachability, and off-deprecation zombie detection.
Verdict deciders and watermark persistence `scripts/hapax-reform-complete` (deciders & watermark)	Per-check deciders that translate observations into CheckResult verdicts with failure reasons; decide_all() aggregator; watermark load/save for tracking previously-passing checks and timestamp.
CLI parsing, main execution, and systemd integration `scripts/hapax-reform-complete` (main & main), `systemd/units/hapax-reform-complete.service`, `systemd/units/hapax-reform-complete.timer`	CLI argument parsing (--observations, --regression-only, --quiet); main() control flow for terminal mode (exit 0 iff all checks pass) and regression-only mode (exit nonzero iff previously-passing check regressed); systemd service runs script via `uv` with `--regression-only`; timer schedules execution every 6 hours after 15-minute boot delay with 300s jitter.
Test infrastructure and per-check unit tests `tests/test_reform_complete.py` (loader, helpers, unit tests)	Dynamic script module loader; _all_good() deterministic observations; per-check pytest test classes validating ledger drift, opus receipt/timer freshness, lane supervisor timer, gate INV-5 presence/hash, escape-grant roundtrip, coord refine verb, shadow-cutover reachability, off-deprecation zombie PIDfiles; TestAggregation and TestWatermark validate aggregation and persistence.
CLI integration tests and regression semantics `tests/test_reform_complete.py` (CLI tests)	Subprocess-based CLI acceptance tests via _run_cli() with --observations/--watermark; validate JSON output contract (complete, failed, reasons); exit codes (0 on all-pass, 1 on gap); --regression-only with empty watermark (never-passed quiet) vs pre-seeded watermark (revert triggers regression); watermark update on successful run.

Sequence Diagram

sequenceDiagram
    participant Systemd as systemd timer
    participant Service as hapax-reform-complete.service
    participant Main as main()
    participant Gather as gather_all()
    participant Decide as decide_all()
    participant WM as Watermark I/O
    participant Output as JSON + exit code
    
    Systemd->>Service: Activate (6h interval)
    Service->>Main: ExecStart --regression-only
    Main->>Gather: Live observation gathering
    Gather-->>Main: observations dict
    Main->>Decide: Convert observations to verdicts
    Decide-->>Main: CheckResult list
    Main->>WM: load_watermark()
    WM-->>Main: previously-passed set
    Main->>Main: Compute regressed = passed_ever ∩ failed_now
    Main->>WM: save_watermark(passed_now, timestamp)
    Main->>Output: JSON verdict + exit 0/1
    Output-->>Service: Return code
    alt Regression detected
        Service->>Service: OnFailure → notify-failure@%n.service
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

hapax-systems/hapax-council#3820: The escape-grant observation and verification in hapax-reform-complete directly uses the SDLC invariant escape-grant plumbing (mint_escape_for_violation, decide_with_escape) introduced in that PR.
hapax-systems/hapax-council#3765: Both PRs validate Phase 3c SDLC INV-3/INV-4/INV-5 runtime semantics; the main PR's canonical-gate and escape-grant checks verify those invariants at deployment time.
hapax-systems/hapax-council#3817: The systemd timer/service are annotated with # Hapax-Auto-Enable: true, which that PR's post-merge deploy logic uses to auto-enable and verify marked units.

Poem

🐰 A rabbit hops through logs and checks,
Gathers observations, connects the flecks—
Is the reform complete, or has it slipped?
A watermark remembers what has tripped.
Every six hours the timer rings,
Testing the state of distributed things. 🔔

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description covers the what, why, implementation details, wiring, and tests, but is missing the required AuthorityCase and Test plan sections from the template.	Add the missing AuthorityCase section with Case/Slice identifiers (CASE-SDLC-REFORM-001 and SLICE-XXX), a detailed test plan, and CLAUDE.md hygiene checklist items.
Docstring Coverage	⚠️ Warning	Docstring coverage is 2.22% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding a mechanical reform-complete acceptance predicate and regression detector for the coordination reform.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch theta/reform-improve-acceptance-predicate-20260601

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e01807019

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T14:47:17Z

+            data = json.loads(receipts[0].read_text(encoding="utf-8"))
+            issued = _coerce_utc(
+                datetime.fromisoformat(str(data["issued_at"]).replace("Z", "+00:00"))
+            )
+            window = _parse_duration_spec(str(data.get("stale_after", "24h")))


Validate the same opus receipt dispatch will load

In a receipt directory with any historical opus_model_entitlement-*.json file, this probes only sorted(...)[0] and then trusts issued_at/stale_after without running the RouteAuthorityReceipt validation that the dispatch path uses (shared/dispatcher_policy.py:220-227, :825-837). That lets a malformed but fresh-looking receipt make the reform predicate pass even though opus routing will reject it, and conversely an old lexicographically-first timestamped receipt can make the regression detector alert while a newer fresh stable receipt is live. Please select/validate receipts through the dispatch loader or equivalent model validation/newest-fresh logic.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T14:47:17Z

+                timeout=60,
+            )
+        try:
+            reachable = rc in (0, 1) and "clean" in json.loads(out)


Require the shadow-cutover predicate to pass

When the real shadow logs are absent or still short, policy-decide-shadow-eval intentionally exits 1 with clean: false (shared/policy_decide.py:784-827), but this check marks the reform realization as OK for any output that merely contains a clean key. In a live host where the evaluator exists but the shadow week has not actually passed, the overall hapax-reform-complete terminal gate can go green even though the 3b cutover gate is still red; run the evaluator against its default live ledgers and require clean/exit 0 unless cutover is already enforced.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T14:47:17Z

+    tracked = _run(["git", "ls-files", "--", "*authority-case-ledger.jsonl"], cwd=REPO_ROOT)[1]
+    tracked_count = len([ln for ln in tracked.splitlines() if ln.strip()])


Inspect every worktree for tracked legacy ledgers

git ls-files only reports the cached paths for the current worktree/index, so in the multi-lane setup this can return zero even while hapax-council--*/evidence/authority-case-ledger.jsonl remains tracked on an unrebased worker worktree. That makes the coord-ssot-ledger realization pass while lanes can still commit/write the old per-worktree authority ledger, violating the sole-ledger invariant this check is supposed to enforce; iterate the sibling worktrees and run the same tracked-file probe in each one.

Useful? React with 👍 / 👎.

ryanklee enabled auto-merge June 1, 2026 14:43

ryanklee added this pull request to the merge queue Jun 1, 2026

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

Merged via the queue into main with commit 9436745 Jun 1, 2026
35 of 37 checks passed

ryanklee deleted the theta/reform-improve-acceptance-predicate-20260601 branch June 1, 2026 14:54

ryanklee mentioned this pull request Jun 1, 2026

fix(sdlc): resolve reform-complete predicate through the gate shim + test gatherer boundary + repoint timer #3842

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sdlc): mechanical reform-complete acceptance predicate + regression detector#3833

feat(sdlc): mechanical reform-complete acceptance predicate + regression detector#3833
ryanklee merged 1 commit into
mainfrom
theta/reform-improve-acceptance-predicate-20260601

ryanklee commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (2 warnings)

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		tracked = _run(["git", "ls-files", "--", "*authority-case-ledger.jsonl"], cwd=REPO_ROOT)[1]
		tracked_count = len([ln for ln in tracked.splitlines() if ln.strip()])

Conversation

ryanklee commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Eight checks (live host)

Wiring

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (2 warnings)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ryanklee commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading