ci(mutants): make mutation gate required with absolute --timeout 240 ceiling + false-green guards#567
Merged
Merged
Conversation
…ATION-CI-TIMEOUT) - .cargo/mutants.toml: add `minimum_test_timeout = 120` (absolute 120s per-mutant ceiling; 3× the ~40s bulk_deadline_propagation real-sleep test); reduce `timeout_multiplier` from 3.0 to 2.0 (operative ceiling is 120s on ~90s baseline). Both settings reference docs/specs/cargo-mutants-policy.md for rationale. --baseline=skip is NOT used (multiplier is silently ignored under that flag). - .github/workflows/ci.yml: raise `mutants` job `timeout-minutes: 60 → 90`; add `mutants` to `ci-gate.needs` (DEC-096/097 required-check wiring pattern). Push-event safety confirmed: `mutants` emits `skipped` on push; ci-gate checks `failure`/`cancelled` only — `skipped` passes through safely. - tests/ci_gate_completeness.rs: extend existing ci-gate guard to add `mutants` to the exact-set in `test_ci_gate_needs_exactly_the_required_jobs`; rename `test_ci_gate_excludes_pr_only_jobs` → `test_ci_gate_excludes_advisory_and_secret_scan_jobs` (removes stale `mutants` exclusion assertion now that mutants is required); add new `test_mutants_is_in_ci_gate_needs` to pin the promotion; update M1 docstring to note `mutants` is intentionally omitted from the unconditional-run check.
…/floor, harden base-ref-drift false-green (MUTATION-CI-TIMEOUT) F5 adversarial correction: the prior commit (257757a) used the wrong timeout knobs. minimum_test_timeout is a FLOOR not a ceiling; timeout_multiplier is dead once --timeout is set. Both are removed from .cargo/mutants.toml. - .cargo/mutants.toml: remove minimum_test_timeout=120 (harmful floor) and timeout_multiplier=2.0 (dead once --timeout is on CLI). Add comment noting the absolute ceiling lives in the CLI invocation only. - ci.yml: add --timeout 180 to cargo mutants invocation (the sole absolute per-mutant ceiling knob in cargo-mutants 27.x). Add SCOPED_DIFF_LINES pre-check (F-3): counts diff lines on examine_globs paths and exports to GITHUB_ENV. Harden zero-outcomes+success branch: if SCOPED_DIFF_LINES > 0, fail with "base-ref drift" error instead of passing silently. - CLAUDE.md: add --timeout 180 to cargo mutants one-liner (doc-fallout convention). - docs/specs/cargo-mutants-policy.md: updated by architect (F5 adversarial correction) — corrected timeout mechanism documentation, 180s derivation, F-3 positive-coverage assertion spec.
…ground --timeout in measured baseline (F5 fix) Fix 1 (HIGH false-RED): The F-3 base-ref drift guard was testing SCOPED_DIFF_LINES (per-file scoped line count), which caused a false-RED on comment-only, whitespace, or reformat edits to scoped files — those produce SCOPED_DIFF_LINES > 0 but legitimately 0 mutants and no outcomes.json. The corrected guard tests the OVERALL diff size: an empty DIFF_FILE signals genuine base-ref drift; any non-empty diff that yields 0 mutants is a legitimate docs-only, comment-only, or non-scoped-file PR and MUST PASS. Removes the now-unnecessary SCOPED_DIFF_LINES computation entirely. Fix 2 (--timeout grounding): Measured real cargo test --all-features baseline on ubuntu-latest from 5 recent green develop runs (2026-06-28): 133-145s range. The previously assumed ~90s baseline was materially wrong. With worst-case runner variance (+20%), worst-case legitimate run ≈ 174s. --timeout 180 gave only 3-6% headroom. Bumped to --timeout 240 (38% headroom over worst-case). Updated CLAUDE.md, ci.yml, and cargo-mutants-policy.md. Calibration note: this PR touches no examine_globs files so its own run hits the 0-mutant path; the first scoped-file PR is the real calibration.
…to config comment + policy doc (F5 MEDIUM) Two un-propagated doc/comment renames from prior F5 fix rounds: 1. .cargo/mutants.toml line 26: comment cited --timeout 180 (stale; actual invocation and policy doc both use 240 since the F5 pass-2 bump). Changed 180 → 240 so the config file is internally consistent. 2. docs/specs/cargo-mutants-policy.md CI Integration section (~line 458): still described the retracted SCOPED_DIFF_LINES mechanism as the base-ref drift guard. Replaced with OVERALL_DIFF_LINES and clarified the guard semantics (FAIL only on empty overall diff; non-empty diff with 0 mutants passes). 3. docs/specs/cargo-mutants-policy.md Path B guidance (~line 501): same stale SCOPED_DIFF_LINES reference in the aggregator-job requirement. Replaced with OVERALL_DIFF_LINES and added the pass-on-non-empty-diff note for Path B operators. Historical/retraction mentions at lines 292–294 and 514 (changelog) are intentionally preserved — they explain the old approach that was replaced. No logic change. Verified: cargo test --test ci_gate_completeness (7/7 green), ci.yml YAML valid, no stale 180/SCOPED_DIFF_LINES in active config sections.
…a reconciliation, base-ref drift diagnostic (MUTATION-CI-TIMEOUT) Fix 1 (MEDIUM, F-1): Add CHANGELOG [Unreleased] Changed entry describing the cargo-mutants job promotion to hard-required gate, --timeout 240 ceiling, 90-min job budget, and 200+-mutant cancellation-equals-blocked signal. References docs/specs/cargo-mutants-policy.md per convention. Fix 2 (MEDIUM, L-1): Pin cargo-mutants tool version to major v27 via `with: tool: cargo-mutants@27`. Without a version pin, taiki-e/install-action installs LATEST on each run; a silent major-version bump could change outcomes.json key names, exit-code mapping, or --timeout semantics on a now-required gate, producing a false-green or false-red without any code change in this repo. Fix 3 (MEDIUM, M-2): Add total_mutants reconciliation warning in the jq branch of Check kill rate. If a future cargo-mutants version adds a new outcome category, those mutants would be silently excluded from the kill-rate denominator (false-green vector). Emits ::warning:: + echo when sum(caught+missed+timeout+unviable) != total_mutants. Does NOT hard-fail — avoids coupling the required gate to an unverified future schema. Fix 4 (LOW, F-3): Append `|| true` to the `git diff origin/...HEAD > DIFF_FILE` line. Without it, a base-ref resolution failure aborts under set -eo pipefail BEFORE OVERALL_DIFF_LINES is exported, routing to the "harness crash" message (mislabeled). With || true: DIFF_FILE is created empty by shell redirection, OVERALL_DIFF_LINES=0, cargo-mutants runs with empty diff (0 mutants, exits 0), Check kill rate routes to the precise "base-ref drift / empty diff" FAIL. No false-GREEN: empty diff still FAILs. Harness-crash branch preserved for true cargo-mutants failures.
…dation, doc fixes Five targeted fixes from the F5 adversarial gate: H-1 (HIGH): Add runtime schema-drift guard in the jq branch of the "Check kill rate" step. When outcomes.json is valid JSON with a non-empty .outcomes array but all top-level summary keys (caught/missed/timeout/ unviable/total_mutants) are 0, the guard exits 1 with a clear "schema drift detected" message. This converts the prior silent false-green (all-zeros → total_outcomes==0 → PASS) into a hard failure. Legitimate paths are untouched: genuine 0-mutant runs produce no outcomes.json (never reaches jq branch); all-unviable runs have non-zero unviable (guard condition false); empty .outcomes arrays flag _outcomes_len==0 (guard condition false). H-2 (HIGH): Add per-variable integer validation after jq extraction. Each of caught/missed/timeout/unviable/total_mutants is checked with `[[ "$x" =~ ^[0-9]+$ ]] || x=0` before any arithmetic. A future schema emitting a string/float/object would survive `// 0` then break `$(( ))` under set -e (false-RED). Normal integer case is byte-for-byte unchanged. Pass C (HIGH): Fix inverted `--timeout`/baseline claim in docs/specs/cargo-mutants-policy.md. The previous text said "the baseline run is required for --timeout to override the multiplier correctly" — FALSE; --timeout applies unconditionally and supersedes the multiplier. Replaced with accurate description: baseline retained to prove the suite green before scoring mutants; --timeout 240 applies as per-mutant ceiling regardless of --baseline=skip. Pass C (MEDIUM): Fix 200-mutant budget row. Formula (mutants/4 × 140s) gives 7000s ≈ 117 min, not "~90 min+". Row now reads "~117 min — exceeds the 90-min ceiling; job cancelled → split the PR". Pass C (MEDIUM): Update stale .cargo/mutants.toml scope comment. The old comment described scope as "bulk + create modules"; examine_globs now also covers src/adf.rs, src/api/jira/issues.rs, src/cache.rs, and jsm modules (MAINT-MUTANTS-GLOBS-01). Updated to point to the policy doc as the canonical scope source. Fix 1 also updates the @27 pin comment to cite both evidence sources: S-346 Pass 5 F1 empirical refutation (red-gate-log.md) and the timeout-keys research file, and to accurately describe the protected schema as "top-level summary keys". ci_gate_completeness: 7/7 green. fmt: clean. clippy: clean. YAML: valid.
pin evidence basis (F5 F1) Add "Schema-Drift and False-Green Guards" section to cargo-mutants-policy.md documenting five CI-implemented mechanisms that were missing from the governing policy doc: 1. cargo-mutants@27 version pin: rationale (protects load-bearing outcomes.json schema, exit-code, and --timeout assumptions), evidence basis (S-346 Pass 5 F1 empirical refutation + cargo-mutants-timeout-keys-verification-2026-06-28.md). 2. Malformed-JSON guard: `jq empty` parseability check → FAIL before any field extraction (prevents OOM-kill truncated file from yielding false-green via // 0). 3. Integer validation (H-2): regex guard on each jq-extracted field to coerce non-integers to 0 rather than crashing bash arithmetic under set -e (false-RED). 4. Runtime schema-drift guard (H-1): FAIL when outcomes array is non-empty but all summary keys are 0 — fingerprint of a schema migration that moved summary keys to a nested object. Documents why this cannot produce false-REDs on legitimate runs. 5. total_mutants reconciliation (M-2): warning-only (not hard-fail) design decision with explicit rationale: false-RED risk from new outcome categories, @27 pin as primary protection, H-1 as defense-in-depth for the worst false-green class. Also fixes O1 (timeout_multiplier disambiguation: 3.0 in S-346 original; 2.0 in MUTATION-CI-TIMEOUT pass-1; removed in pass-2) and O3 (soften .factory/cicd-setup.md reference from "canonical" to "historical/pending refresh"). All claims cross-verified against ci.yml implementation. ci_gate_completeness: 7/7 pass (doc-only change, no CI YAML touched).
Owner
Author
Review Cycle 1 Triage
pr-reviewer verdict: APPROVE — 0 blocking findings, 3 NITs (all non-blocking) All NIT findings deferred — no code changes required. Proceeding to CI gate check. |
Zious11
added a commit
that referenced
this pull request
Jun 28, 2026
…122a8 (DEC-144) ## Summary MUTATION-CI-TIMEOUT cycle CLOSED (2026-06-28). PR #567 `ci(mutants): make mutation gate required with absolute --timeout 240 ceiling + false-green guards` squash-merged → develop @ 3b122a8. ## Changes STATE.md - timestamp → 2026-06-28T12:00:00Z - current_step → IDLE MUTATION-CI-TIMEOUT CLOSED - Project Metadata: Stories **97**, Activation HEAD → develop @ 3b122a8 - Phase Progress: new MUTATION-CI-TIMEOUT COMPLETE row (keep last 5; E2E WIREMOCK row archived); 5-row limit maintained - Current Phase Steps: MUTATION-CI-TIMEOUT CYCLE DELIVERED step added; oldest step (G-ADF-FOOTNOTE) archived - Decisions Log: DEC-144 added (MUTATION-CI-TIMEOUT convergence; CRITICAL inverted-knob; HIGH false-RED; measured baseline 133–145s; policy-doc-only governance; 5 guards) - Drift Items: MUTATION-CI-TIMEOUT → RESOLVED; 4 new items added (MUTANTS-ARBITER- OFFLINE-SELFTEST, MUTANTS-PARTIAL-SCHEMA-RESIDUAL, MUTANTS-SHARDING-PATH-B, MUTANTS-FIRST-SCOPED-PR-CALIBRATION) - Session Resume Checkpoint: updated to reflect PR #567 @ 3b122a8; Stories 97; Open PRs: NONE - Resume Plan: Stories 97; develop HEAD 3b122a8; RECENTLY CLOSED updated; OPEN BACKLOG updated (MUTATION-CI-TIMEOUT removed; MUTANTS-SHARDING-PATH-B added) - Open Issues Tracker: #567 row added CLOSED stories/STORY-INDEX.md - total_stories: 96 → 97; version 1.4.48 → 1.4.49 - Wave-plan feature-followup count 61 → 62; description updated - Final totals line: 97 stories, sum 7+8+7+10+3+62=97 - Story Manifest: S-MUTATION-CI-TIMEOUT-1 row added - Total rows line: 96 → 97 cycles/cycle-001/lessons.md - 4 new lessons appended: VERIFY-TOOL-CONFIG-SEMANTICS [codified] GROUND-CI-BUDGETS-IN-MEASURED-DATA [codified] FULL-VSDD-CI-CONFIG-CATCHES-CRITICAL [codified] ORCHESTRATOR-RELAYED-MERGE-AUTH [process-gap] cicd-setup.md - §1.1a Last updated line updated - mutants job catalog row updated (HARD-REQUIRED, @27, --timeout 240, timeout-minutes 90, 5 guards) - §1.1a full specification rewritten: gate status, @27 pin, --timeout 240, 5 false-green guards, measured baseline, stale keys removed (minimum_test_timeout / timeout_multiplier), gate promotion record - Feature Summary mutants row updated - advisory-only footnote corrected stories/S-MUTATION-CI-TIMEOUT-1.md (ADDED) - Retroactive F3 traceability for PR #567 Phase/research artifacts (ADDED): - phase-f1-delta-analysis/MUTATION-CI-TIMEOUT-affected-files.txt - phase-f1-delta-analysis/MUTATION-CI-TIMEOUT-delta-analysis.md - research/cargo-mutants-timeout-keys-verification-2026-06-28.md - research/mutation-ci-perf-2026-06-28.md Input-hash sweep: updated 4 files (burst-log, session-checkpoints, blocking-issues-resolved, RESEARCH-INDEX). Post-update: TOTAL=13 MATCH=11 STALE=0. Count-propagation sweep: check-spec-counts.sh exit 0; check-bc-cumulative-counts.sh exit 0 (BC 605 unchanged). Old story count (96) present only in historical transition records (95→96 rows) — intentional, immutable audit trail.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Make the
cargo-mutantsCI job a hard-required branch-protection check (viaci-gate.needs, per DEC-096/097). Fixes the DEC-132 problem where the mutation jobfailed only via a 1-hour wall-clock timeout and was previously non-required.
The absolute per-mutant ceiling is now
--timeout 240on the CLI invocation — theonly effective mechanism in cargo-mutants 27.x (no
.cargo/mutants.tomlkeyexists for this). Value is grounded in measured ubuntu-latest baselines (133–145s,
5 green develop runs) with ~38% headroom for runner variance.
Calibration caveat: This PR touches no
examine_globsfiles, so its own mutantsrun hits the 0-mutant path and does NOT exercise the
--timeout 240ceiling or thekill-rate arithmetic. The first PR that touches a scoped file (
src/adf.rs,src/api/jira/bulk.rs,src/cache.rs, etc.) is the real calibration — watch fortimeoutoutcomes on healthy mutants and bump--timeoutif needed.What Changed
Root Cause Fixed
The
mutantsCI job for PR #553 (ADF recursion guard) was cancelled at exactly60 minutes after evaluating 36 mutants. Root cause:
src/api/jira/bulk.rsis inexamine_globsandtests/bulk_deadline_propagation.rsuses real wall-clock sleeps(~30–40s per run) because
tokio::time::pauseis incompatible with subprocess +wiremock. With no
--timeoutceiling, the per-mutant cost was unbounded above.The previous config had
minimum_test_timeout = 120(a floor, not a ceiling —confirmed CRITICAL error) and
timeout_multiplier = 3.0(dead config once--timeoutis present). Both removed. The only fix is--timeout 240on the CLI.Files Changed
.github/workflows/ci.ymlmutantstoci-gate.needs; raisetimeout-minutes: 60→90; add--timeout 240to invocation; addOVERALL_DIFF_LINESbase-ref drift guard; add malformed-JSON guard; add integer-validation guard; add H-1 runtime schema-drift guard; add M-2 total_mutants reconciliation warning; pincargo-mutants@27.cargo/mutants.tomlminimum_test_timeoutandtimeout_multiplier(dead/inverted config); add explanatory comment about--timeoutbeing CLI-onlytests/ci_gate_completeness.rstest_mutants_is_in_ci_gate_needs; updatetest_ci_gate_needs_exactly_the_required_jobsto pin exact 8-job set includingmutants; update exclusion test commentsdocs/specs/cargo-mutants-policy.mdCLAUDE.md--timeout 240to Build & Test mutation commandCHANGELOG.mdArchitecture Changes
graph TD A[PR opens] --> B[mutants job runs] B --> C{OVERALL_DIFF_LINES == 0?} C -- yes --> D[FAIL: base-ref drift] C -- no, 0 mutants --> E[PASS: legitimate zero-mutant PR] C -- no, N mutants --> F[Run cargo mutants --timeout 240] F --> G{outcomes.json parseable?} G -- no --> H[FAIL: harness crash/malformed JSON] G -- yes --> I{kill_rate >= 90%?} I -- yes --> J[PASS: gate passes] I -- no --> K[FAIL: missed/timeout mutants listed] F --> L[ci-gate aggregates result] L --> M[Branch protection checks ci-gate only]Story Dependencies
graph LR S346[S-346: Initial mutants gate advisory] --> MUTATION[MUTATION-CI-TIMEOUT: promote to required] MAINT01[MAINT-MUTANTS-GLOBS-01: scope expansion] --> MUTATION SEC001[SEC-001 / PR-553: triggered the timeout] --> MUTATIONNo upstream PRs to wait on. No downstream PRs blocked.
Spec Traceability
flowchart LR DEC132[DEC-132: mutation job failed by 1hr timeout] --> POLICY[cargo-mutants-policy.md §CI Gate] DEC096[DEC-096/097: ci-gate.needs pattern] --> CIGATE[ci-gate.needs includes mutants] POLICY --> CIYML[.github/workflows/ci.yml mutants job] CIGATE --> CIYML CIYML --> GUARD[tests/ci_gate_completeness.rs] GUARD --> PINNED[exact 8-job set pinned by test]Governance: policy-doc-only (no BC authored — human resolution F1 §8, Q3). The
governing artifact is
docs/specs/cargo-mutants-policy.md.Test Evidence
test_ci_gate_job_exists_with_correct_shelltest_ci_gate_needs_exactly_the_required_jobstest_ci_gate_excludes_advisory_and_secret_scan_jobstest_mutants_is_in_ci_gate_needstest_ci_gate_fails_on_failed_or_cancelled_needtest_ci_gate_needs_jobs_have_no_event_conditional_iftest_ci_gate_pass_fail_semantics_are_structurally_placedAll 7 guard tests in
tests/ci_gate_completeness.rspass. Clippy/fmt clean. YAML valid.Coverage: CI-config-only changes. No Rust source logic changed; coverage metrics unchanged.
Mutation kill-rate on this PR: 0 mutants (no
examine_globsfiles changed) — legitimatezero-mutant path; the base-ref drift guard passes because the overall diff is non-empty.
Demo Evidence
N/A — CI pipeline change. No user-facing CLI behavior to demonstrate. The observable
outcome is the
mutantsjob result in GitHub Actions CI on this PR itself.Holdout Evaluation
N/A — evaluated at wave gate (CI-only governance change).
Adversarial Review
F5 fresh-context adversarial gate: 3 CLEAN-CONVERGED passes after catching and fixing:
minimum_test_timeoutis a floor, not ceiling)--timeoutvalue (180s → 240s; measured baseline showed 133–145s, not ~90s); governance-doc gapsAll findings resolved before PR creation.
Security Review
No security-relevant changes. This PR modifies CI workflow YAML and policy documentation
only. No Rust source code changed. No new network endpoints, auth paths, or secrets introduced.
The
cargo-mutants@27pin is a supply-chain hardening measure (guards against silentschema/exit-code changes in the tool).
Risk Assessment
Blast radius: CI pipeline only. No change to the compiled binary, runtime behavior,
or public CLI surface.
Performance impact: The
mutantsjob already ran (advisory). Making it required addszero additional CI time — only the merge-gate behavior changes. The
timeout-minutesraise(60→90) increases the maximum wall time for large PRs touching scoped files, but the actual
per-PR cost is bounded by the number of mutants the diff generates.
Rollback: Remove
mutantsfromci-gate.needsand reverttimeout-minutesto 60.AI Pipeline Metadata
Pre-Merge Checklist