[TEST] CI: split s2n-bignum-proofs into per-subdirectory matrix entries#408
Draft
jakemas wants to merge 8 commits into
Draft
[TEST] CI: split s2n-bignum-proofs into per-subdirectory matrix entries#408jakemas wants to merge 8 commits into
jakemas wants to merge 8 commits into
Conversation
Replace the single arch-level proofs job (one ~3h run for all 341 arm proofs and 347 x86 proofs combined) with a per-subdirectory matrix that groups proofs by their assembly subdirectory. Each matrix entry runs only `make subdir/*.correct` for its subdirectory. Failures now surface per-subdirectory on the PR page, and the critical-path wall-clock drops to the longest single subdirectory (typically generic/, ~1h) instead of summed. Closes awslabs#407 (per-proof CI granularity, runner-pool-permitting).
Build on the per-subdirectory matrix from the previous commit: 1. tools/compute-proof-deps.py walks each .S/.ml pair and follows the transitive `needs "..."` / `loadt "..."` graph to compute the set of files whose change can invalidate each proof. Conservative: when a referenced file doesn't exist in-tree (Library/, EC/), it is treated as an external pin and not added. 2. The proofs job now computes the dep map at run time, derives the set of changed files via `git diff base..head`, and only runs proofs whose dep set intersects the changed set. PRs that only touch tests/, docs, x86_att/, etc. now run zero proofs. Touching common/mlkem_mldsa.ml runs the ~36 proofs that consume it. 3. tools/run-proofs-with-checks.sh wraps each per-proof `make .correct` with a GitHub Check Run create/update sequence. Each proof posts an in_progress check, then completes with success/failure. The PR page gets one check entry per proof in the affected set, alongside the subdirectory-level matrix-job summary. Soundness: the dep parser conservatively over-reports (when in doubt, include). Anything imported via `needs` or `loadt` from inside a tracked .ml file is captured. base.ml's loadt of decode/instruction/arm.ml ensures any change to the instruction model invalidates every proof.
This commit deliberately touches arm/proofs/mldsa_pointwise.ml so the new dep-aware CI runs at least one proof end-to-end with the per-proof check-run flow. Will be reverted before merge.
The previous filter let every matrix entry pay the ~10min HOL Light build cost regardless of whether its subdir was affected. Move the filter decision before the HOL Light install + apt install, and gate the heavy steps with . Unaffected matrix entries now finish in <1min (checkout + jq/python install + dep map compute + diff + decide).
Replace the per-subdirectory matrix with a per-proof matrix so each proof
appears as its own check on the PR page. Two stages:
s2n-bignum-proofs-matrix (cheap, ubuntu-24.04): runs
tools/list-proofs.py + tools/compute-proof-deps.py + the new
tools/filter-affected.py to emit affected.json -- the JSON array of
{arch, subdir, name} entries whose dep set intersects the PR diff.
Empty changed-files (push to main) selects all 663 entries.
s2n-bignum-proofs (codebuild fleet, fan-out): one matrix entry per
affected proof, gated on outputs.proofs != '[]'. Each entry restores
hol-light/ from actions/cache (key includes HOLLIGHT_COMMIT + runner
os/arch), builds it only on cache miss, then runs the single
subdir/name.correct target. The cache amortises the HOL Light install
across the matrix so each per-proof entry's overhead is ~30s.
The check-run posting wrapper (tools/run-proofs-with-checks.sh) and the
subdir-affected predicate (tools/decide-affected.py) are no longer
needed -- the matrix entries themselves are the per-proof checks, and
filter-affected.py replaces the per-subdir gate.
Written with the assistance of Claude Opus 4.7.
Signed-off-by: Jake Massimo <jakemas@amazon.com>
Contributor
Author
|
Testing some stuff out. A algorithm split is very simple to do: https://github.com/awslabs/s2n-bignum/pull/408/checks?sha=b1b785b3cafb16e7e78a088f6ac3306889f5bbc5. But I like the individual break down of each proof. Problem is, ~600+ proofs here total, so it's a balance of not overwhelming github runners. Current approach, matrix all 600+ proofs out, turn off all proofs not effectedm by changes. Instruction additions will be hefty, i suppose 300 CI jobs, so perhaps if the count gets above a particular theshold we default back to the non-granular versions that run today, and test everything in 3h |
Validates that the per-proof matrix scales to mid-sized fan-out (one common-file change → 36 affected proofs across arm + x86) and that the HOL Light cache is shared across the matrix (this is the second push to the branch, so cache should hit on every entry). Written with the assistance of Claude Opus 4.7. Signed-off-by: Jake Massimo <jakemas@amazon.com>
…roofs Two fixes for the per-proof matrix: 1. opam bakes absolute shebangs into the binaries it installs, e.g. _opam/bin/ocaml has `#!/codebuild/output/srcXXXXX/.../_opam/bin/ocamlrun`. The codebuild fleet allocates a fresh srcXXXXX per matrix entry, so restoring the cached `hol-light/` from the previous job's workspace leaves every binary pointing at a path that doesn't exist (`bad interpreter: No such file or directory`). Pin the cache path to `/root/hol-light` (stable across the fleet image) and bump the cache key to `-v2` to invalidate the broken entry. 2. GitHub Actions caps a job matrix at 256 entries per workflow run -- non-raisable, per the official actions-limits docs. When the affected proof set exceeds the threshold (default 200), stage 1 routes through a per-arch bulk job (`s2n-bignum-proofs-bulk`) instead of the per-proof fan-out. The bulk job does one HOL Light install per arch and runs all affected `.correct` targets with `make -j`, matching the old monolithic behaviour. Small PRs still get per-proof check visibility on the PR page; large blast-radius changes (e.g. arm/proofs/decode.ml -> 325 proofs) still complete instead of failing to schedule. Written with the assistance of Claude Opus 4.7. Signed-off-by: Jake Massimo <jakemas@amazon.com>
…fallback decode.ml is loaded transitively by every ARM proof, so this no-op edit selects all 325 ARM proofs as affected. That blows past the 200-entry per-proof threshold and should route stage 2 through s2n-bignum-proofs-bulk with a single matrix entry (`arm affected proofs (bulk)`) doing one HOL Light install + `make -j` over all the .correct targets. Written with the assistance of Claude Opus 4.7. Signed-off-by: Jake Massimo <jakemas@amazon.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the single arch-level proofs job (one ~3h run for all 341 arm proofs and 347 x86 proofs combined) with a per-subdirectory matrix that groups proofs by their assembly subdirectory. Each matrix entry runs only `make subdir/*.correct` for its subdirectory.
Failures now surface per-subdirectory on the PR page, and the critical-path wall-clock drops to the longest single subdirectory (typically `generic/`) instead of being summed across all of them.
Towards #407.
Caveats / things to verify
Test plan