[TEST] CI: split s2n-bignum-proofs into per-subdirectory matrix entries by jakemas · Pull Request #408 · awslabs/s2n-bignum

jakemas · 2026-05-22T18:59:07Z

Summary

Replace the single arch-level proofs job (one ~3h run for all 341 arm proofs and 347 x86 proofs combined) with a per-subdirectory matrix that groups proofs by their assembly subdirectory. Each matrix entry runs only `make subdir/*.correct` for its subdirectory.

Failures now surface per-subdirectory on the PR page, and the critical-path wall-clock drops to the longest single subdirectory (typically `generic/`) instead of being summed across all of them.

Towards #407.

Caveats / things to verify

This relies on the CodeBuild fleet being able to spin up multiple instances per workflow run. The current `runs-on` qualifier `-${{ github.run_id }}-${{ github.run_attempt }}` may serialize matrix entries onto a single ephemeral instance — if that's the case, we'll see worse wall-clock than the status quo (per-job HOL-Light setup multiplied by 23 entries) and need to revisit the runner topology.
Tutorial removed from this job (already covered by `s2n-bignum-tutorial` and `s2n-bignum-tutorial-macos-arm`).
`collect-times.sh` removed; can be re-added later as an aggregator step.

Test plan

Watch this PR's CI to see whether the 23 matrix entries run in parallel or serialize
Compare total wall-clock to the current 3h baseline
Confirm a deliberate proof failure surfaces in only the relevant matrix entry

Replace the single arch-level proofs job (one ~3h run for all 341 arm proofs and 347 x86 proofs combined) with a per-subdirectory matrix that groups proofs by their assembly subdirectory. Each matrix entry runs only `make subdir/*.correct` for its subdirectory. Failures now surface per-subdirectory on the PR page, and the critical-path wall-clock drops to the longest single subdirectory (typically generic/, ~1h) instead of summed. Closes awslabs#407 (per-proof CI granularity, runner-pool-permitting).

Build on the per-subdirectory matrix from the previous commit: 1. tools/compute-proof-deps.py walks each .S/.ml pair and follows the transitive `needs "..."` / `loadt "..."` graph to compute the set of files whose change can invalidate each proof. Conservative: when a referenced file doesn't exist in-tree (Library/, EC/), it is treated as an external pin and not added. 2. The proofs job now computes the dep map at run time, derives the set of changed files via `git diff base..head`, and only runs proofs whose dep set intersects the changed set. PRs that only touch tests/, docs, x86_att/, etc. now run zero proofs. Touching common/mlkem_mldsa.ml runs the ~36 proofs that consume it. 3. tools/run-proofs-with-checks.sh wraps each per-proof `make .correct` with a GitHub Check Run create/update sequence. Each proof posts an in_progress check, then completes with success/failure. The PR page gets one check entry per proof in the affected set, alongside the subdirectory-level matrix-job summary. Soundness: the dep parser conservatively over-reports (when in doubt, include). Anything imported via `needs` or `loadt` from inside a tracked .ml file is captured. base.ml's loadt of decode/instruction/arm.ml ensures any change to the instruction model invalidates every proof.

This commit deliberately touches arm/proofs/mldsa_pointwise.ml so the new dep-aware CI runs at least one proof end-to-end with the per-proof check-run flow. Will be reverted before merge.

The previous filter let every matrix entry pay the ~10min HOL Light build cost regardless of whether its subdir was affected. Move the filter decision before the HOL Light install + apt install, and gate the heavy steps with . Unaffected matrix entries now finish in <1min (checkout + jq/python install + dep map compute + diff + decide).

Replace the per-subdirectory matrix with a per-proof matrix so each proof appears as its own check on the PR page. Two stages: s2n-bignum-proofs-matrix (cheap, ubuntu-24.04): runs tools/list-proofs.py + tools/compute-proof-deps.py + the new tools/filter-affected.py to emit affected.json -- the JSON array of {arch, subdir, name} entries whose dep set intersects the PR diff. Empty changed-files (push to main) selects all 663 entries. s2n-bignum-proofs (codebuild fleet, fan-out): one matrix entry per affected proof, gated on outputs.proofs != '[]'. Each entry restores hol-light/ from actions/cache (key includes HOLLIGHT_COMMIT + runner os/arch), builds it only on cache miss, then runs the single subdir/name.correct target. The cache amortises the HOL Light install across the matrix so each per-proof entry's overhead is ~30s. The check-run posting wrapper (tools/run-proofs-with-checks.sh) and the subdir-affected predicate (tools/decide-affected.py) are no longer needed -- the matrix entries themselves are the per-proof checks, and filter-affected.py replaces the per-subdir gate. Written with the assistance of Claude Opus 4.7. Signed-off-by: Jake Massimo <jakemas@amazon.com>

jakemas · 2026-05-22T20:38:34Z

Testing some stuff out. A algorithm split is very simple to do: https://github.com/awslabs/s2n-bignum/pull/408/checks?sha=b1b785b3cafb16e7e78a088f6ac3306889f5bbc5. But I like the individual break down of each proof. Problem is, ~600+ proofs here total, so it's a balance of not overwhelming github runners.

Current approach, matrix all 600+ proofs out, turn off all proofs not effectedm by changes. Instruction additions will be hefty, i suppose 300 CI jobs, so perhaps if the count gets above a particular theshold we default back to the non-granular versions that run today, and test everything in 3h

Validates that the per-proof matrix scales to mid-sized fan-out (one common-file change → 36 affected proofs across arm + x86) and that the HOL Light cache is shared across the matrix (this is the second push to the branch, so cache should hit on every entry). Written with the assistance of Claude Opus 4.7. Signed-off-by: Jake Massimo <jakemas@amazon.com>

…roofs Two fixes for the per-proof matrix: 1. opam bakes absolute shebangs into the binaries it installs, e.g. _opam/bin/ocaml has `#!/codebuild/output/srcXXXXX/.../_opam/bin/ocamlrun`. The codebuild fleet allocates a fresh srcXXXXX per matrix entry, so restoring the cached `hol-light/` from the previous job's workspace leaves every binary pointing at a path that doesn't exist (`bad interpreter: No such file or directory`). Pin the cache path to `/root/hol-light` (stable across the fleet image) and bump the cache key to `-v2` to invalidate the broken entry. 2. GitHub Actions caps a job matrix at 256 entries per workflow run -- non-raisable, per the official actions-limits docs. When the affected proof set exceeds the threshold (default 200), stage 1 routes through a per-arch bulk job (`s2n-bignum-proofs-bulk`) instead of the per-proof fan-out. The bulk job does one HOL Light install per arch and runs all affected `.correct` targets with `make -j`, matching the old monolithic behaviour. Small PRs still get per-proof check visibility on the PR page; large blast-radius changes (e.g. arm/proofs/decode.ml -> 325 proofs) still complete instead of failing to schedule. Written with the assistance of Claude Opus 4.7. Signed-off-by: Jake Massimo <jakemas@amazon.com>

…fallback decode.ml is loaded transitively by every ARM proof, so this no-op edit selects all 325 ARM proofs as affected. That blows past the 200-entry per-proof threshold and should route stage 2 through s2n-bignum-proofs-bulk with a single matrix entry (`arm affected proofs (bulk)`) doing one HOL Light install + `make -j` over all the .correct targets. Written with the assistance of Claude Opus 4.7. Signed-off-by: Jake Massimo <jakemas@amazon.com>

jakemas changed the title ~~CI: split s2n-bignum-proofs into per-subdirectory matrix entries~~ [TEST] CI: split s2n-bignum-proofs into per-subdirectory matrix entries May 22, 2026

jakemas added 4 commits May 22, 2026 19:13

CI smoke test: no-op comment edit to trigger mldsa_pointwise proof

b13d523

This commit deliberately touches arm/proofs/mldsa_pointwise.ml so the new dep-aware CI runs at least one proof end-to-end with the per-proof check-run flow. Will be reverted before merge.

jakemas added 3 commits May 22, 2026 20:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEST] CI: split s2n-bignum-proofs into per-subdirectory matrix entries#408

[TEST] CI: split s2n-bignum-proofs into per-subdirectory matrix entries#408
jakemas wants to merge 8 commits into
awslabs:mainfrom
jakemas:ci-per-subdir-matrix

jakemas commented May 22, 2026

Uh oh!

jakemas commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jakemas commented May 22, 2026

Summary

Caveats / things to verify

Test plan

Uh oh!

jakemas commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant