Skip to content

feat(sdlc): mechanical reform-complete acceptance predicate + regression detector#3833

Merged
ryanklee merged 1 commit into
mainfrom
theta/reform-improve-acceptance-predicate-20260601
Jun 1, 2026
Merged

feat(sdlc): mechanical reform-complete acceptance predicate + regression detector#3833
ryanklee merged 1 commit into
mainfrom
theta/reform-improve-acceptance-predicate-20260601

Conversation

@ryanklee
Copy link
Copy Markdown
Collaborator

@ryanklee ryanklee commented Jun 1, 2026

What

scripts/hapax-reform-complete — a mechanical "is the coordination reform REALIZED?" predicate over the live host. Mirrors policy-decide-shadow-eval: exit 0 = every load-bearing realization is live, non-zero + a JSON verdict with reasons[] otherwise.

The meta-fix against a 3rd merged-vs-realized cycle (CASE-SDLC-REFORM-001): "merged" was conflated with "realized" at the PR layer once, then the expansive audit found the same pattern recur at the filesystem/activation layer — with no single checkable "reform-is-DONE" predicate analogous to the per-unit policy-decide-shadow-eval. This converts that judgment into a predicate.

Eight checks (live host)

check realization verified
coord-ssot-ledger daemon-owned coord ledger provisioned+writable+sole; coord-drift-check green
opus-route-authority signed opus route-authority receipt present+unexpired; upkeep timer live
lane-supervisor hapax-lane-supervisor.timer enabled+active (FM-11)
canonical-gate INV-5 (is_cognition_path) in live+repo gate; live gate == repo source
escape-grant daemon-independent escape wired; grant mint→verify round-trip (INV-3/4)
coord-verbs coord.request.refine present; ≥1 non-fallback coord verb
shadow-cutover reform 3b shadow-cutover predicate reachable
off-deprecation HAPAX_*_OFF deprecated + retro-grant backstop; no zombie launcher pidfiles

Live calibration: 7/8. coord-verbs is honestly OPEN — all 49 coord verbs are dry-run stubs pending the daemon-owned ledger writer (manifest unit K). The predicate flips green when that lands. That is the point: it refuses to call the reform "done" while a realization is missing — and each check was tuned against live data to avoid false positives (e.g. advisory coord-drift-check drift and unwired per-worktree gate source copies are surfaced as detail, not failures).

Wiring

  • Terminal gatereform-execution-manifest.yaml gains a completion_gate: block naming the predicate, mirroring 3b-cutover.shadow_eval. (Vault file; not in this diff.)
  • Regression detectorhapax-reform-complete.{service,timer} run --regression-only every 6h and notify on a silent revert (OnFailure=notify-failure@ + the script's own send_notification). A high-water mark keeps the detector quiet for not-yet-realized checks and fires only when a previously-passing realization reverts. # Hapax-Auto-Enable: true → goes live on merge.

Tests

A gather/decide split keeps the decision logic pure. 42 tests cover every decider, verdict aggregation, the watermark, and the CLI exit-code contract via --observations (deterministic, no live host / systemd needed). Ruff clean.

Task: reform-improve-acceptance-predicate-20260601

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • New health-check predicate system for monitoring reform completion status
    • Systemd timer scheduling automated validation checks every 6 hours
    • Regression detection mode to identify when previously-passing checks revert
    • JSON output for programmatic result consumption
  • Tests

    • Added comprehensive test suite for the reform-complete validation system

…ression detector

Convert "is the coordination reform DONE?" from an eyeball judgment into a
checkable predicate over the LIVE host — the meta-fix against a 3rd
merged-vs-realized cycle (CASE-SDLC-REFORM-001). "Merged" was conflated with
"realized" at the PR layer once; the expansive audit found the same pattern
recur at the filesystem/activation layer with no single "reform-is-DONE"
predicate analogous to the per-unit policy-decide-shadow-eval.

scripts/hapax-reform-complete mirrors policy-decide-shadow-eval: exit 0 = every
load-bearing realization is live, non-zero + a JSON verdict with reasons[]
otherwise. Eight checks: coord SSOT ledger (provisioned+writable+sole +
coord-drift-check green), opus route-authority receipt (present+unexpired +
upkeep timer), hapax-lane-supervisor.timer, canonical gate (INV-5 in live+repo
gate, live==repo source), escape grant (mint->verify round-trip, INV-3/4),
coord.request.refine + >=1 non-fallback coord verb, 3b shadow-cutover predicate
reachable, HAPAX_*_OFF deprecation + retro-grant backstop + no zombie launchers.
Calibrated against the live host (7/8; coord-verbs OPEN — all coord verbs are
dry-run stubs pending the daemon-owned ledger writer, so the predicate honestly
reports the reform as not-yet-complete and flips green when that lands).

Wired as the manifest terminal gate (reform-execution-manifest.yaml
completion_gate, mirroring 3b-cutover.shadow_eval) + a periodic regression
detector: systemd/units/hapax-reform-complete.{service,timer} run
--regression-only every 6h and ntfy on a SILENT REVERT (OnFailure +
send_notification). A high-water mark keeps the detector quiet for
not-yet-realized checks and fires only when a previously-passing realization
reverts. Auto-Enable marker so it goes live on merge.

A gather/decide split keeps the decision logic pure; 42 tests cover every
decider, verdict aggregation, the watermark, and the CLI exit-code contract via
--observations (deterministic, no live host needed).

Task: reform-improve-acceptance-predicate-20260601

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 04026471-fd88-4bc5-8fc2-ef994fe45fd9

📥 Commits

Reviewing files that changed from the base of the PR and between 3f6bcbc and 3e01807.

📒 Files selected for processing (4)
  • scripts/hapax-reform-complete
  • systemd/units/hapax-reform-complete.service
  • systemd/units/hapax-reform-complete.timer
  • tests/test_reform_complete.py

📝 Walkthrough

Walkthrough

The PR introduces a new coordination reform-complete predicate script that verifies deployment readiness by probing system components, converting observations into verdicts, and tracking regression via persistent watermarks. Includes systemd timer/service scheduling and comprehensive acceptance tests.

Changes

Reform-complete coordination predicate

Layer / File(s) Summary
Script foundation and observation gathering
scripts/hapax-reform-complete (setup & gatherers)
Script constants, check IDs, CheckResult dataclass, and low-level probe helpers; observation gatherers for ledger SSOT, opus route-authority, escape-grant wiring/roundtrip, coord verbs, shadow-cutover reachability, and off-deprecation zombie detection.
Verdict deciders and watermark persistence
scripts/hapax-reform-complete (deciders & watermark)
Per-check deciders that translate observations into CheckResult verdicts with failure reasons; decide_all() aggregator; watermark load/save for tracking previously-passing checks and timestamp.
CLI parsing, main execution, and systemd integration
scripts/hapax-reform-complete (main & main), systemd/units/hapax-reform-complete.service, systemd/units/hapax-reform-complete.timer
CLI argument parsing (--observations, --regression-only, --quiet); main() control flow for terminal mode (exit 0 iff all checks pass) and regression-only mode (exit nonzero iff previously-passing check regressed); systemd service runs script via uv with --regression-only; timer schedules execution every 6 hours after 15-minute boot delay with 300s jitter.
Test infrastructure and per-check unit tests
tests/test_reform_complete.py (loader, helpers, unit tests)
Dynamic script module loader; _all_good() deterministic observations; per-check pytest test classes validating ledger drift, opus receipt/timer freshness, lane supervisor timer, gate INV-5 presence/hash, escape-grant roundtrip, coord refine verb, shadow-cutover reachability, off-deprecation zombie PIDfiles; TestAggregation and TestWatermark validate aggregation and persistence.
CLI integration tests and regression semantics
tests/test_reform_complete.py (CLI tests)
Subprocess-based CLI acceptance tests via _run_cli() with --observations/--watermark; validate JSON output contract (complete, failed, reasons); exit codes (0 on all-pass, 1 on gap); --regression-only with empty watermark (never-passed quiet) vs pre-seeded watermark (revert triggers regression); watermark update on successful run.

Sequence Diagram

sequenceDiagram
    participant Systemd as systemd timer
    participant Service as hapax-reform-complete.service
    participant Main as main()
    participant Gather as gather_all()
    participant Decide as decide_all()
    participant WM as Watermark I/O
    participant Output as JSON + exit code
    
    Systemd->>Service: Activate (6h interval)
    Service->>Main: ExecStart --regression-only
    Main->>Gather: Live observation gathering
    Gather-->>Main: observations dict
    Main->>Decide: Convert observations to verdicts
    Decide-->>Main: CheckResult list
    Main->>WM: load_watermark()
    WM-->>Main: previously-passed set
    Main->>Main: Compute regressed = passed_ever ∩ failed_now
    Main->>WM: save_watermark(passed_now, timestamp)
    Main->>Output: JSON verdict + exit 0/1
    Output-->>Service: Return code
    alt Regression detected
        Service->>Service: OnFailure → notify-failure@%n.service
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • hapax-systems/hapax-council#3820: The escape-grant observation and verification in hapax-reform-complete directly uses the SDLC invariant escape-grant plumbing (mint_escape_for_violation, decide_with_escape) introduced in that PR.
  • hapax-systems/hapax-council#3765: Both PRs validate Phase 3c SDLC INV-3/INV-4/INV-5 runtime semantics; the main PR's canonical-gate and escape-grant checks verify those invariants at deployment time.
  • hapax-systems/hapax-council#3817: The systemd timer/service are annotated with # Hapax-Auto-Enable: true, which that PR's post-merge deploy logic uses to auto-enable and verify marked units.

Poem

🐰 A rabbit hops through logs and checks,
Gathers observations, connects the flecks—
Is the reform complete, or has it slipped?
A watermark remembers what has tripped.
Every six hours the timer rings,
Testing the state of distributed things. 🔔

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description covers the what, why, implementation details, wiring, and tests, but is missing the required AuthorityCase and Test plan sections from the template. Add the missing AuthorityCase section with Case/Slice identifiers (CASE-SDLC-REFORM-001 and SLICE-XXX), a detailed test plan, and CLAUDE.md hygiene checklist items.
Docstring Coverage ⚠️ Warning Docstring coverage is 2.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a mechanical reform-complete acceptance predicate and regression detector for the coordination reform.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch theta/reform-improve-acceptance-predicate-20260601

Comment @coderabbitai help to get the list of available commands and usage tips.

@ryanklee ryanklee enabled auto-merge June 1, 2026 14:43
@ryanklee ryanklee added this pull request to the merge queue Jun 1, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e01807019

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +221 to +225
data = json.loads(receipts[0].read_text(encoding="utf-8"))
issued = _coerce_utc(
datetime.fromisoformat(str(data["issued_at"]).replace("Z", "+00:00"))
)
window = _parse_duration_spec(str(data.get("stale_after", "24h")))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate the same opus receipt dispatch will load

In a receipt directory with any historical opus_model_entitlement-*.json file, this probes only sorted(...)[0] and then trusts issued_at/stale_after without running the RouteAuthorityReceipt validation that the dispatch path uses (shared/dispatcher_policy.py:220-227, :825-837). That lets a malformed but fresh-looking receipt make the reform predicate pass even though opus routing will reject it, and conversely an old lexicographically-first timestamped receipt can make the regression detector alert while a newer fresh stable receipt is live. Please select/validate receipts through the dispatch loader or equivalent model validation/newest-fresh logic.

Useful? React with 👍 / 👎.

timeout=60,
)
try:
reachable = rc in (0, 1) and "clean" in json.loads(out)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require the shadow-cutover predicate to pass

When the real shadow logs are absent or still short, policy-decide-shadow-eval intentionally exits 1 with clean: false (shared/policy_decide.py:784-827), but this check marks the reform realization as OK for any output that merely contains a clean key. In a live host where the evaluator exists but the shadow week has not actually passed, the overall hapax-reform-complete terminal gate can go green even though the 3b cutover gate is still red; run the evaluator against its default live ledgers and require clean/exit 0 unless cutover is already enforced.

Useful? React with 👍 / 👎.

Comment on lines +188 to +189
tracked = _run(["git", "ls-files", "--", "*authority-case-ledger.jsonl"], cwd=REPO_ROOT)[1]
tracked_count = len([ln for ln in tracked.splitlines() if ln.strip()])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Inspect every worktree for tracked legacy ledgers

git ls-files only reports the cached paths for the current worktree/index, so in the multi-lane setup this can return zero even while hapax-council--*/evidence/authority-case-ledger.jsonl remains tracked on an unrebased worker worktree. That makes the coord-ssot-ledger realization pass while lanes can still commit/write the old per-worktree authority ledger, violating the sole-ledger invariant this check is supposed to enforce; iterate the sibling worktrees and run the same tracked-file probe in each one.

Useful? React with 👍 / 👎.

Merged via the queue into main with commit 9436745 Jun 1, 2026
35 of 37 checks passed
@ryanklee ryanklee deleted the theta/reform-improve-acceptance-predicate-20260601 branch June 1, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant