Skip to content

[TEST] CI: split s2n-bignum-proofs into per-subdirectory matrix entries#408

Draft
jakemas wants to merge 8 commits into
awslabs:mainfrom
jakemas:ci-per-subdir-matrix
Draft

[TEST] CI: split s2n-bignum-proofs into per-subdirectory matrix entries#408
jakemas wants to merge 8 commits into
awslabs:mainfrom
jakemas:ci-per-subdir-matrix

Conversation

@jakemas
Copy link
Copy Markdown
Contributor

@jakemas jakemas commented May 22, 2026

Summary

Replace the single arch-level proofs job (one ~3h run for all 341 arm proofs and 347 x86 proofs combined) with a per-subdirectory matrix that groups proofs by their assembly subdirectory. Each matrix entry runs only `make subdir/*.correct` for its subdirectory.

Failures now surface per-subdirectory on the PR page, and the critical-path wall-clock drops to the longest single subdirectory (typically `generic/`) instead of being summed across all of them.

Towards #407.

Caveats / things to verify

  • This relies on the CodeBuild fleet being able to spin up multiple instances per workflow run. The current `runs-on` qualifier `-${{ github.run_id }}-${{ github.run_attempt }}` may serialize matrix entries onto a single ephemeral instance — if that's the case, we'll see worse wall-clock than the status quo (per-job HOL-Light setup multiplied by 23 entries) and need to revisit the runner topology.
  • Tutorial removed from this job (already covered by `s2n-bignum-tutorial` and `s2n-bignum-tutorial-macos-arm`).
  • `collect-times.sh` removed; can be re-added later as an aggregator step.

Test plan

  • Watch this PR's CI to see whether the 23 matrix entries run in parallel or serialize
  • Compare total wall-clock to the current 3h baseline
  • Confirm a deliberate proof failure surfaces in only the relevant matrix entry

Replace the single arch-level proofs job (one ~3h run for all 341 arm
proofs and 347 x86 proofs combined) with a per-subdirectory matrix
that groups proofs by their assembly subdirectory. Each matrix entry
runs only `make subdir/*.correct` for its subdirectory.

Failures now surface per-subdirectory on the PR page, and the
critical-path wall-clock drops to the longest single subdirectory
(typically generic/, ~1h) instead of summed.

Closes awslabs#407 (per-proof CI granularity, runner-pool-permitting).
@jakemas jakemas changed the title CI: split s2n-bignum-proofs into per-subdirectory matrix entries [TEST] CI: split s2n-bignum-proofs into per-subdirectory matrix entries May 22, 2026
jakemas added 4 commits May 22, 2026 19:13
Build on the per-subdirectory matrix from the previous commit:

1. tools/compute-proof-deps.py walks each .S/.ml pair and follows the
   transitive `needs "..."` / `loadt "..."` graph to compute the set of
   files whose change can invalidate each proof. Conservative: when a
   referenced file doesn't exist in-tree (Library/, EC/), it is treated
   as an external pin and not added.

2. The proofs job now computes the dep map at run time, derives the set
   of changed files via `git diff base..head`, and only runs proofs
   whose dep set intersects the changed set. PRs that only touch
   tests/, docs, x86_att/, etc. now run zero proofs. Touching
   common/mlkem_mldsa.ml runs the ~36 proofs that consume it.

3. tools/run-proofs-with-checks.sh wraps each per-proof `make .correct`
   with a GitHub Check Run create/update sequence. Each proof posts an
   in_progress check, then completes with success/failure. The PR page
   gets one check entry per proof in the affected set, alongside the
   subdirectory-level matrix-job summary.

Soundness: the dep parser conservatively over-reports (when in doubt,
include). Anything imported via `needs` or `loadt` from inside a tracked
.ml file is captured. base.ml's loadt of decode/instruction/arm.ml
ensures any change to the instruction model invalidates every proof.
This commit deliberately touches arm/proofs/mldsa_pointwise.ml so the
new dep-aware CI runs at least one proof end-to-end with the per-proof
check-run flow. Will be reverted before merge.
The previous filter let every matrix entry pay the ~10min HOL Light
build cost regardless of whether its subdir was affected. Move the
filter decision before the HOL Light install + apt install, and gate
the heavy steps with .

Unaffected matrix entries now finish in <1min (checkout + jq/python
install + dep map compute + diff + decide).
Replace the per-subdirectory matrix with a per-proof matrix so each proof
appears as its own check on the PR page. Two stages:

  s2n-bignum-proofs-matrix (cheap, ubuntu-24.04): runs
  tools/list-proofs.py + tools/compute-proof-deps.py + the new
  tools/filter-affected.py to emit affected.json -- the JSON array of
  {arch, subdir, name} entries whose dep set intersects the PR diff.
  Empty changed-files (push to main) selects all 663 entries.

  s2n-bignum-proofs (codebuild fleet, fan-out): one matrix entry per
  affected proof, gated on outputs.proofs != '[]'. Each entry restores
  hol-light/ from actions/cache (key includes HOLLIGHT_COMMIT + runner
  os/arch), builds it only on cache miss, then runs the single
  subdir/name.correct target. The cache amortises the HOL Light install
  across the matrix so each per-proof entry's overhead is ~30s.

The check-run posting wrapper (tools/run-proofs-with-checks.sh) and the
subdir-affected predicate (tools/decide-affected.py) are no longer
needed -- the matrix entries themselves are the per-proof checks, and
filter-affected.py replaces the per-subdir gate.

Written with the assistance of Claude Opus 4.7.

Signed-off-by: Jake Massimo <jakemas@amazon.com>
@jakemas
Copy link
Copy Markdown
Contributor Author

jakemas commented May 22, 2026

Testing some stuff out. A algorithm split is very simple to do: https://github.com/awslabs/s2n-bignum/pull/408/checks?sha=b1b785b3cafb16e7e78a088f6ac3306889f5bbc5. But I like the individual break down of each proof. Problem is, ~600+ proofs here total, so it's a balance of not overwhelming github runners.

Current approach, matrix all 600+ proofs out, turn off all proofs not effectedm by changes. Instruction additions will be hefty, i suppose 300 CI jobs, so perhaps if the count gets above a particular theshold we default back to the non-granular versions that run today, and test everything in 3h

jakemas added 3 commits May 22, 2026 20:56
Validates that the per-proof matrix scales to mid-sized fan-out (one
common-file change → 36 affected proofs across arm + x86) and that the
HOL Light cache is shared across the matrix (this is the second push to
the branch, so cache should hit on every entry).

Written with the assistance of Claude Opus 4.7.

Signed-off-by: Jake Massimo <jakemas@amazon.com>
…roofs

Two fixes for the per-proof matrix:

1. opam bakes absolute shebangs into the binaries it installs, e.g.
   _opam/bin/ocaml has `#!/codebuild/output/srcXXXXX/.../_opam/bin/ocamlrun`.
   The codebuild fleet allocates a fresh srcXXXXX per matrix entry, so
   restoring the cached `hol-light/` from the previous job's workspace
   leaves every binary pointing at a path that doesn't exist (`bad
   interpreter: No such file or directory`). Pin the cache path to
   `/root/hol-light` (stable across the fleet image) and bump the cache
   key to `-v2` to invalidate the broken entry.

2. GitHub Actions caps a job matrix at 256 entries per workflow run --
   non-raisable, per the official actions-limits docs. When the affected
   proof set exceeds the threshold (default 200), stage 1 routes through
   a per-arch bulk job (`s2n-bignum-proofs-bulk`) instead of the per-proof
   fan-out. The bulk job does one HOL Light install per arch and runs all
   affected `.correct` targets with `make -j`, matching the old monolithic
   behaviour. Small PRs still get per-proof check visibility on the PR
   page; large blast-radius changes (e.g. arm/proofs/decode.ml -> 325
   proofs) still complete instead of failing to schedule.

Written with the assistance of Claude Opus 4.7.

Signed-off-by: Jake Massimo <jakemas@amazon.com>
…fallback

decode.ml is loaded transitively by every ARM proof, so this no-op edit
selects all 325 ARM proofs as affected. That blows past the 200-entry
per-proof threshold and should route stage 2 through s2n-bignum-proofs-bulk
with a single matrix entry (`arm affected proofs (bulk)`) doing one HOL
Light install + `make -j` over all the .correct targets.

Written with the assistance of Claude Opus 4.7.

Signed-off-by: Jake Massimo <jakemas@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant