Round-2 estate CI audit (2026-06-25): ~300 of 377 repos have red CI; ~207 have required-check deadlock (failing shared checks required for merge β the reason ~everything has needed admin-merge). Root-caused from live failed-run logs. No changes made yet (owner chose diagnose-first).
MASTER CAUSE
The 2026-06-23/24 standardization sweep that bumped all wrapper pins to @d135b05 introduced two startup-failure bugs in the shared reusables/wrappers:
- Bug A β
timeout-minutes: on a uses: reusable-caller job. GitHub forbids it: "when a reusable workflow is called with 'uses', 'timeout-minutes' is not available." β run fails validation, 0 jobs. In governance.yml, secret-scanner.yml, mirror.yml wrappers (generator-emitted).
- Bug B β reusable job requests permissions above the caller's cap. At
@d135b05 the scorecard + secret-scanner(gitleaks) reusables gained job-level permissions: requesting security-events: write / id-token: write / pull-requests: write; callers cap at workflow-level contents: read. GitHub: "permissions can only be maintained or reduced β not elevated." β unsatisfiable β startup_failure. Proven by the sharp transition (success at @3e4bd4c β startup_failure at @d135b05, wrapper otherwise identical).
Per-workflow root cause
| Workflow |
~count |
Type |
Root cause |
Fix |
| OSSF Scorecard |
90 |
startup_failure |
Bug B (security-events/id-token writes exceed cap) |
declare writes at reusable top-level permissions: (or grant at caller workflow-level); re-pin |
Secret Scanner (scan/*) |
87 |
startup_failure |
Bug B (gitleaks job perms) + Bug A (timeout on some wrappers) |
remove gitleaks job-perms block / hoist scopes; strip timeout-minutes; re-pin |
| Governance |
48 |
startup_failure |
Bug A |
delete timeout-minutes: from governance.yml uses: job + fix generator |
| Hypatia Scan |
47 |
step-failure |
hypatia-cli.sh scan . exits 1 on β₯medium findings as first line of a bash -e step β aborts before the warn-don't-fail logic (all critical=0) |
append ` |
| dogfood-gate (K9) |
52 |
step-failure |
validator drift: k9-validate-action@2d96f43c now requires literal K9! first line + signature_required; templates never migrated |
fix *.k9.ncl templates OR relax the validator to accept a leading SPDX comment before K9! |
| mirror.yml |
56 |
mixed |
Bug A (timeout) + stale pre-#416 reds + skipped-misread-as-failure |
strip timeout-minutes; the rest is expected-noise |
Remediation plan (highest leverage first)
- Strip
timeout-minutes: from every reusable-caller (uses:) job (Bug A) + fix the wrapper generator so it stops emitting it. Detector: actionlint. β clears Governance + much of Secret Scanner + Mirror.
- Fix Bug B: declare needed write scopes at the reusable top-level
permissions: in scorecard-reusable.yml + secret-scanner-reusable.yml (and/or caller workflow-level), then re-pin estate-wide. β clears Scorecard + clean-wrapper Secret Scanner.
|| true on the hypatia scan line. β clears ~47.
- Drop stale required contexts (e.g.
scan / trufflehog, renamed) from each ruleset that still lists them (owner-gated, per-repo) β clears the masked deadlock on the ~207.
- K9 template migration (owner-gated) β dogfood-gate.
Items 1β3 are single central edits (reusables + generator) that clear the large majority of the ~300 red repos; 4β5 are owner-gated/per-repo. After 1β3, re-run the audit and reassess the deadlock before touching rulesets.
Round-2 estate CI audit (2026-06-25): ~300 of 377 repos have red CI; ~207 have required-check deadlock (failing shared checks required for merge β the reason ~everything has needed admin-merge). Root-caused from live failed-run logs. No changes made yet (owner chose diagnose-first).
MASTER CAUSE
The 2026-06-23/24 standardization sweep that bumped all wrapper pins to
@d135b05introduced two startup-failure bugs in the shared reusables/wrappers:timeout-minutes:on auses:reusable-caller job. GitHub forbids it: "when a reusable workflow is called with 'uses', 'timeout-minutes' is not available." β run fails validation, 0 jobs. Ingovernance.yml,secret-scanner.yml,mirror.ymlwrappers (generator-emitted).@d135b05the scorecard + secret-scanner(gitleaks) reusables gained job-levelpermissions:requestingsecurity-events: write/id-token: write/pull-requests: write; callers cap at workflow-levelcontents: read. GitHub: "permissions can only be maintained or reduced β not elevated." β unsatisfiable β startup_failure. Proven by the sharp transition (success at@3e4bd4cβ startup_failure at@d135b05, wrapper otherwise identical).Per-workflow root cause
permissions:(or grant at caller workflow-level); re-pinscan/*)timeout-minutes; re-pintimeout-minutes:fromgovernance.ymluses:job + fix generatorhypatia-cli.sh scan .exits 1 on β₯medium findings as first line of abash -estep β aborts before the warn-don't-fail logic (all critical=0)k9-validate-action@2d96f43cnow requires literalK9!first line +signature_required; templates never migrated*.k9.ncltemplates OR relax the validator to accept a leading SPDX comment beforeK9!skipped-misread-as-failuretimeout-minutes; the rest is expected-noiseRemediation plan (highest leverage first)
timeout-minutes:from every reusable-caller (uses:) job (Bug A) + fix the wrapper generator so it stops emitting it. Detector:actionlint. β clears Governance + much of Secret Scanner + Mirror.permissions:inscorecard-reusable.yml+secret-scanner-reusable.yml(and/or caller workflow-level), then re-pin estate-wide. β clears Scorecard + clean-wrapper Secret Scanner.|| trueon the hypatia scan line. β clears ~47.scan / trufflehog, renamed) from each ruleset that still lists them (owner-gated, per-repo) β clears the masked deadlock on the ~207.Items 1β3 are single central edits (reusables + generator) that clear the large majority of the ~300 red repos; 4β5 are owner-gated/per-repo. After 1β3, re-run the audit and reassess the deadlock before touching rulesets.