feat(completeness): add formal completeness gate for teardown and merge#81
Open
tomharper wants to merge 6 commits into
Open
feat(completeness): add formal completeness gate for teardown and merge#81tomharper wants to merge 6 commits into
tomharper wants to merge 6 commits into
Conversation
Z3-backed neurosymbolic check (rules as data) that proves a task's done/teardown/merge claim consistent with prime directives kunchenguid#2 and kunchenguid#3. Hard rules gate; soft rules score. Wired into fm-teardown.sh and fm-merge-local.sh; fails open when the solver is absent so the existing bash checks remain the hard guarantee.
Replace the neurosymbolic_evaluator dependency with a small self-contained Z3 shim (axes -> EnumSort, hard rules -> Implies, per-rule SAT check for named violations, deterministic soft scoring). The only dependency is now z3-solver (public PyPI), so the gate is portable with no private coupling. Behavior and the JSON/CLI contract are unchanged.
Report COMPLETENESS_GATE: available when python3 can import z3, add a z3 install resolver (pip install z3-solver), and document it alongside TASKS_AXI. Mirrors the optional-capability pattern: silent and never blocking when absent, since the gate fails open.
…k and strict exit-3 enforcement
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What Changed
bin/fm-completeness-check.sh,bin/fm-completeness.py,bin/fm-completeness.rules.json) that proves each task's done/teardown/merge claim consistent with the directives, wired intofm-teardown.shandfm-merge-local.sh; it fails open when the solver is absent and is controlled byFM_COMPLETENESS_GATE(off-switch) andFM_COMPLETENESS_STRICT(enforce-when-broken).fm-bootstrap.shas an optional capability (COMPLETENESS_GATE: available), emitted only whenpython3can importz3.tests/fm-completeness.test.shand synced README/CONTRIBUTING/AGENTS docs.Risk Assessment
✅ Low: The change is well-bounded, both prior-round findings are correctly fixed, and the new gate faithfully mirrors the bash safety checks it guards (verified across local-only, non-local-only, absent-worktree, and squash-to-fork cases) while failing open when the solver tooling is absent.
Testing
Baseline test suite (tests/fm-completeness.test.sh) passes all 15 assertions with z3 4.15.2 present, exercising the solver tier rather than just the fail-open stubs. I then captured operator-visible CLI transcripts showing the gate's actual verdict lines for every hard invariant (SAT vs BLOCKED with the named violated directive rule), plus fail-open/strict/off-switch behavior and the two fixes in the latest commit (strict exit-3 enforcement and the absent-worktree non-false-block). Finally I drove the real fm-merge-local.sh against a sandbox local-only repo to prove the directive-#2 wiring changes actual git state only after captain approval is asserted. No UI surface is involved — this is a CLI/lifecycle gate, so CLI transcripts and persisted git state are the appropriate end-user evidence. Working tree left clean; all sandboxes removed.
Evidence: Completeness gate verdict transcript (SAT/BLOCKED across all invariants)
Evidence: Fail-open, strict enforcement, off-switch, and the two latest-commit fixes
Evidence: End-to-end: real fm-merge-local.sh refused without approval, fast-forwards main with approval
Evidence: Test suite result (15/15 pass, solver tier active)
Pipeline
Updates from git push no-mistakes
⏭️ **intent** - skipped
✅ No issues found.
🔧 **Rebase** - 1 issue found → auto-fixed ✅
AGENTS.md- merge conflict rebasing onto origin/main🔧 Fix applied.
✅ Re-checked - no issues remain.
bin/fm-completeness-check.sh:83- When the worktree directory is absent, git_unlanded_facts sets LANDED=none, which trips the SHIP_REQUIRES_LANDED hard rule (forbid landed=none) and blocks ship teardown (exit 2). This diverges from fm-teardown.sh, which guards its unlanded check withif [ -d "$WT" ]and skips it entirely when the worktree is gone, allowing teardown. On z3-installed setups this false-blocks legitimate cleanup of a task whose worktree was already removed/returned (e.g. an interrupted prior teardown). Resolve the absent-worktree case to a non-blocking state (treat 'nothing on disk to discard' as landed/clean) to match the bash semantics it mirrors.bin/fm-teardown.sh:428- The wired callers only treat gate exit code 2 as a refusal (if [ "$gate_rc" = 2 ]). Under FM_COMPLETENESS_STRICT=1, fail_open exits 3 to signal 'refuse on broken tooling', but both fm-teardown.sh:428 and fm-merge-local.sh:52 fall through on exit 3 and proceed. This makes strict mode silently ineffective at the exact call sites it is meant to harden, contradicting the documented enforce-when-broken guarantee. Halt on exit 3 as well (or treat any non-zero gate exit as a stop).🔧 Fix: fix completeness gate absent-worktree false-block and strict exit-3 enforcement
1 info still open:
bin/fm-completeness.py:119- prove_consistency() is defined and documented in the module docstring as the check that the whole hard rule set is satisfiable over free axes, but check()/main() never call it. As a result a self-contradictory pair of hard rules in fm-completeness.rules.json would go undetected — every concrete claim would then be reported UNSAT (blocked) with no signal that the rule data itself is broken. Either wire it into check() (e.g. surface a distinct 'rules inconsistent' error so the wrapper fails open) or drop the unused function and its docstring reference.✅ **Test** - passed
✅ No issues found.
bash tests/fm-completeness.test.sh— all 15 assertions pass including the solver-dependent tier (z3 4.15.2 importable)Operator-visible verdict transcript viabin/fm-completeness-check.shacross the full invariant matrix (ship landed/unlanded, scout with/without report, merge with/without approval, graded compliance score)Fail-open (FM_COMPLETENESS_RULES=/no/such.json-> exit 0), strict enforcement (FM_COMPLETENESS_STRICT=1-> exit 3), off-switch (FM_COMPLETENESS_GATE=0-> exit 0)Latest-commit absent-worktree fix:--gate teardown --idagainst synthetic meta whose worktree path does not exist -> SAT (no false block) for both no-mistakes and local-only modesEnd-to-end real-script wiring:bin/fm-merge-local.sh demo-1on a sandbox local-only git repo — refused (main unchanged) without approval, fast-forwarded main to the change withFM_CAPTAIN_APPROVED=granted✅ **Document** - passed
✅ No issues found.
✅ **Lint** - passed
✅ No issues found.
✅ **Push** - passed
✅ No issues found.