feat(sdlc): mechanical reform-complete acceptance predicate + regression detector#3833
Conversation
…ression detector
Convert "is the coordination reform DONE?" from an eyeball judgment into a
checkable predicate over the LIVE host — the meta-fix against a 3rd
merged-vs-realized cycle (CASE-SDLC-REFORM-001). "Merged" was conflated with
"realized" at the PR layer once; the expansive audit found the same pattern
recur at the filesystem/activation layer with no single "reform-is-DONE"
predicate analogous to the per-unit policy-decide-shadow-eval.
scripts/hapax-reform-complete mirrors policy-decide-shadow-eval: exit 0 = every
load-bearing realization is live, non-zero + a JSON verdict with reasons[]
otherwise. Eight checks: coord SSOT ledger (provisioned+writable+sole +
coord-drift-check green), opus route-authority receipt (present+unexpired +
upkeep timer), hapax-lane-supervisor.timer, canonical gate (INV-5 in live+repo
gate, live==repo source), escape grant (mint->verify round-trip, INV-3/4),
coord.request.refine + >=1 non-fallback coord verb, 3b shadow-cutover predicate
reachable, HAPAX_*_OFF deprecation + retro-grant backstop + no zombie launchers.
Calibrated against the live host (7/8; coord-verbs OPEN — all coord verbs are
dry-run stubs pending the daemon-owned ledger writer, so the predicate honestly
reports the reform as not-yet-complete and flips green when that lands).
Wired as the manifest terminal gate (reform-execution-manifest.yaml
completion_gate, mirroring 3b-cutover.shadow_eval) + a periodic regression
detector: systemd/units/hapax-reform-complete.{service,timer} run
--regression-only every 6h and ntfy on a SILENT REVERT (OnFailure +
send_notification). A high-water mark keeps the detector quiet for
not-yet-realized checks and fires only when a previously-passing realization
reverts. Auto-Enable marker so it goes live on merge.
A gather/decide split keeps the decision logic pure; 42 tests cover every
decider, verdict aggregation, the watermark, and the CLI exit-code contract via
--observations (deterministic, no live host needed).
Task: reform-improve-acceptance-predicate-20260601
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThe PR introduces a new coordination reform-complete predicate script that verifies deployment readiness by probing system components, converting observations into verdicts, and tracking regression via persistent watermarks. Includes systemd timer/service scheduling and comprehensive acceptance tests. ChangesReform-complete coordination predicate
Sequence DiagramsequenceDiagram
participant Systemd as systemd timer
participant Service as hapax-reform-complete.service
participant Main as main()
participant Gather as gather_all()
participant Decide as decide_all()
participant WM as Watermark I/O
participant Output as JSON + exit code
Systemd->>Service: Activate (6h interval)
Service->>Main: ExecStart --regression-only
Main->>Gather: Live observation gathering
Gather-->>Main: observations dict
Main->>Decide: Convert observations to verdicts
Decide-->>Main: CheckResult list
Main->>WM: load_watermark()
WM-->>Main: previously-passed set
Main->>Main: Compute regressed = passed_ever ∩ failed_now
Main->>WM: save_watermark(passed_now, timestamp)
Main->>Output: JSON verdict + exit 0/1
Output-->>Service: Return code
alt Regression detected
Service->>Service: OnFailure → notify-failure@%n.service
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3e01807019
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| data = json.loads(receipts[0].read_text(encoding="utf-8")) | ||
| issued = _coerce_utc( | ||
| datetime.fromisoformat(str(data["issued_at"]).replace("Z", "+00:00")) | ||
| ) | ||
| window = _parse_duration_spec(str(data.get("stale_after", "24h"))) |
There was a problem hiding this comment.
Validate the same opus receipt dispatch will load
In a receipt directory with any historical opus_model_entitlement-*.json file, this probes only sorted(...)[0] and then trusts issued_at/stale_after without running the RouteAuthorityReceipt validation that the dispatch path uses (shared/dispatcher_policy.py:220-227, :825-837). That lets a malformed but fresh-looking receipt make the reform predicate pass even though opus routing will reject it, and conversely an old lexicographically-first timestamped receipt can make the regression detector alert while a newer fresh stable receipt is live. Please select/validate receipts through the dispatch loader or equivalent model validation/newest-fresh logic.
Useful? React with 👍 / 👎.
| timeout=60, | ||
| ) | ||
| try: | ||
| reachable = rc in (0, 1) and "clean" in json.loads(out) |
There was a problem hiding this comment.
Require the shadow-cutover predicate to pass
When the real shadow logs are absent or still short, policy-decide-shadow-eval intentionally exits 1 with clean: false (shared/policy_decide.py:784-827), but this check marks the reform realization as OK for any output that merely contains a clean key. In a live host where the evaluator exists but the shadow week has not actually passed, the overall hapax-reform-complete terminal gate can go green even though the 3b cutover gate is still red; run the evaluator against its default live ledgers and require clean/exit 0 unless cutover is already enforced.
Useful? React with 👍 / 👎.
| tracked = _run(["git", "ls-files", "--", "*authority-case-ledger.jsonl"], cwd=REPO_ROOT)[1] | ||
| tracked_count = len([ln for ln in tracked.splitlines() if ln.strip()]) |
There was a problem hiding this comment.
Inspect every worktree for tracked legacy ledgers
git ls-files only reports the cached paths for the current worktree/index, so in the multi-lane setup this can return zero even while hapax-council--*/evidence/authority-case-ledger.jsonl remains tracked on an unrebased worker worktree. That makes the coord-ssot-ledger realization pass while lanes can still commit/write the old per-worktree authority ledger, violating the sole-ledger invariant this check is supposed to enforce; iterate the sibling worktrees and run the same tracked-file probe in each one.
Useful? React with 👍 / 👎.
What
scripts/hapax-reform-complete— a mechanical "is the coordination reform REALIZED?" predicate over the live host. Mirrorspolicy-decide-shadow-eval: exit 0 = every load-bearing realization is live, non-zero + a JSON verdict withreasons[]otherwise.The meta-fix against a 3rd merged-vs-realized cycle (CASE-SDLC-REFORM-001): "merged" was conflated with "realized" at the PR layer once, then the expansive audit found the same pattern recur at the filesystem/activation layer — with no single checkable "reform-is-DONE" predicate analogous to the per-unit
policy-decide-shadow-eval. This converts that judgment into a predicate.Eight checks (live host)
coord-ssot-ledgercoord-drift-checkgreenopus-route-authoritylane-supervisorhapax-lane-supervisor.timerenabled+active (FM-11)canonical-gateis_cognition_path) in live+repo gate; live gate == repo sourceescape-grantcoord-verbscoord.request.refinepresent; ≥1 non-fallback coord verbshadow-cutoveroff-deprecationHAPAX_*_OFFdeprecated + retro-grant backstop; no zombie launcher pidfilesLive calibration: 7/8.
coord-verbsis honestly OPEN — all 49 coord verbs are dry-run stubs pending the daemon-owned ledger writer (manifest unit K). The predicate flips green when that lands. That is the point: it refuses to call the reform "done" while a realization is missing — and each check was tuned against live data to avoid false positives (e.g. advisorycoord-drift-checkdrift and unwired per-worktree gate source copies are surfaced as detail, not failures).Wiring
reform-execution-manifest.yamlgains acompletion_gate:block naming the predicate, mirroring3b-cutover.shadow_eval. (Vault file; not in this diff.)hapax-reform-complete.{service,timer}run--regression-onlyevery 6h and notify on a silent revert (OnFailure=notify-failure@+ the script's ownsend_notification). A high-water mark keeps the detector quiet for not-yet-realized checks and fires only when a previously-passing realization reverts.# Hapax-Auto-Enable: true→ goes live on merge.Tests
A
gather/decidesplit keeps the decision logic pure. 42 tests cover every decider, verdict aggregation, the watermark, and the CLI exit-code contract via--observations(deterministic, no live host / systemd needed). Ruff clean.Task: reform-improve-acceptance-predicate-20260601
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests