ML-DSA Aarch64 HOL-Light proof poly_use_hint_32#372
Open
jakemas wants to merge 16 commits into
Open
Conversation
65aba57 to
abffca5
Compare
a5c79f4 to
354c915
Compare
ec26efe to
46a0f3a
Compare
…ctness
Add HOL-Light proofs for the ML-DSA poly_use_hint_32 AArch64 assembly
implementation, covering parameter sets 65/87 (GAMMA2=(Q-1)/32).
Includes:
- Functional correctness proof: each output coefficient equals the
FIPS 204 UseHint specification (mldsa_use_hint_32_spec), not just
output bounds
- Barrett reduction equivalence proof (assembly vs C decomposition)
- Constant-time and memory safety proofs
- Shared ML-DSA infrastructure lemmas in common/mlkem_mldsa.ml
- CBMC-style test harness in tests/test.c
The functional correctness proof establishes that the SIMD assembly
computes exactly the same result as the reference UseHint algorithm
for all valid inputs (a < Q, h in {0,1}).
Signed-off-by: Jake Massimo <jakemas@amazon.com>
word_2smulh is defined in arm/proofs/instruction.ml which is not loaded by x86 proofs. Moving the lemmas inline into the proof file avoids breaking x86 proofs that load the common file.
Replace weak output-bounds postcondition with full functional correctness: each output word equals word(mldsa_use_hint_32_spec (val(x i)) (val(y i))). Key proof techniques: - Bridge lemmas (REAL_INT_GT_BRIDGE) resolve both real_gt and int_gt from a single NUM fact, avoiding backtick type mismatch - Inner if elimination via COND_CASES_TAC + A0_UPPER_32 contradiction - INT_MUL_LZERO + INT_SUB_RZERO for wrap case simplification
Remove 7 unused helper lemmas (WORD_ISHR_ROUND_18, SPEC_NOWRAP_32, HINT_H1_POS, HINT_H1_NEG, WORD_OR_NEG1_32, WORD_NEG1_VAL_32, SPEC_BOUND) and move 4 shared lemmas (WORD_AND_ONES_32, WORD_MUL_1_32, REAL_INT_GT_BRIDGE, REAL_INT_GT_BRIDGE_POS) to common/mlkem_mldsa.ml so they can be shared with the poly_use_hint_88 proof. Signed-off-by: Jake Massimo <jakemas@amazon.com>
Replace the 428-line case-split Barrett proof with a general DIV_SANDWICH + BARRETT_INTERVAL lemma approach. The general lemma proves that if both formulas are sandwiched in [k*d, (k+1)*d) for a given interval, then both equal k. The proof cascades through the 17-entry interval table, discharging numeric side conditions with NUM_REDUCE_CONV. Signed-off-by: Jake Massimo <jakemas@amazon.com>
Rename MLDSA_USE_HINT_EXEC/CORRECT/SUBROUTINE_CORRECT/SUBROUTINE_SAFE to include _32 suffix, matching the _88 naming convention. Add mldsa_poly_use_hint_32 entry to subroutine_signatures.ml for the constant-time and memory safety proof. Signed-off-by: Jake Massimo <jakemas@amazon.com>
7120b24 to
3176b7e
Compare
Add FIPS 204 definitions (mldsa_cmod, mldsa_decompose_32, mldsa_use_hint_32) and equivalence proof connecting code-aligned spec to FIPS 204 Algorithm 40. Move shared arch-independent helpers (DIV_SANDWICH, INT_MOD_RESIDUE) to common/mlkem_mldsa.ml. ENSURES_STRENGTHEN_POST stays in the proof file as it references armstate. Signed-off-by: Jake Massimo <jakemas@amazon.com>
eda11cd to
e46bea5
Compare
The proof body ended mid-THENL, missing the tactic list that discharges the `x DIV d < k + 1` subgoal. This caused camlp5 to fail parsing the file with: Parse error: [and_let_binding] expected after [first_let_binding] since the next `let INT_MOD_RESIDUE = ...` was read as a continuation of the incomplete `let DIV_SANDWICH` expression. Restore the missing line. Signed-off-by: Jake Massimo <jakemas@amazon.com>
Apply the same refactor that was made on pq-code-package/mldsa-native#1037 to keep the two repos consistent: - Rename the code-aligned per-coefficient spec mldsa_use_hint_32_spec to mldsa_use_hint_32_code and keep it local to the proof file. FIPS 204 mldsa_use_hint_32 (Algorithm 40) is the shared definition in common/mlkem_mldsa.ml. - Rename the code-aligned CORRECT / CORRECT_BOUND / SUBROUTINE_CORRECT to *_CODE suffixes to mark them as internal stepping stones. - Make MLDSA_USE_HINT_32_SUBROUTINE_CORRECT the single public theorem, stated in terms of FIPS 204 mldsa_use_hint_32 with the < 16 output bound as a corollary. Derived from _SUBROUTINE_CORRECT_CODE via MLDSA_USE_HINT_32_EQUIV. - Move safety proof to the end of the file. mk_safety_spec still uses _SUBROUTINE_CORRECT_CODE so the hardcoded safety goal matches. - Lift shared FIPS 204 definitions (mldsa_cmod, mldsa_decompose_32, decompose_32_r{0,1}, mldsa_use_hint_32) and helpers (LOWER/UPPER _NONWRAP_R1, MLDSA_USE_HINT_32_UNFOLD) into common/mlkem_mldsa.ml. - Keep ENSURES_STRENGTHEN_POST local to the proof file (armstate-specific). Signed-off-by: Jake Massimo <jakemas@amazon.com>
jakemas
added a commit
to jakemas/s2n-bignum
that referenced
this pull request
May 1, 2026
Apply the same refactor as PR awslabs#372 did for _32, mirroring pq-code-package/mldsa-native#1037 to keep the two repos consistent: - Rename the code-aligned per-coefficient spec mldsa_use_hint_88_spec to mldsa_use_hint_88_code and keep it local to the proof file. FIPS 204 mldsa_use_hint_88 (Algorithm 40) is the shared definition in common/mlkem_mldsa.ml. - Rename the code-aligned CORRECT / CORRECT_BOUND / SUBROUTINE_CORRECT to *_CODE suffixes to mark them as internal stepping stones. - Make MLDSA_USE_HINT_88_SUBROUTINE_CORRECT the single public theorem, stated in terms of FIPS 204 mldsa_use_hint_88 with the < 44 output bound as a corollary. Derived from _SUBROUTINE_CORRECT_CODE via MLDSA_USE_HINT_88_EQUIV. - Move safety proof to the end of the file. mk_safety_spec still uses _SUBROUTINE_CORRECT_CODE so the hardcoded safety goal matches. - Lift shared FIPS 204 definitions (mldsa_cmod, mldsa_decompose_88, decompose_88_r{0,1}, mldsa_use_hint_88) and helpers (LOWER/UPPER _NONWRAP_R1_88, MLDSA_USE_HINT_88_UNFOLD) into common/mlkem_mldsa.ml. - Keep ENSURES_STRENGTHEN_POST local to the proof file (armstate-specific). Signed-off-by: Jake Massimo <jakemas@amazon.com>
Resolves conflicts by including both mldsa_poly_use_hint_32 from the PR branch and the three new mldsa_pointwise_acc functions (l4, l5, l7) from main in all affected files.
Contributor
|
I'm still thinking about the interesting aspects of this PR, but I noticed one oddity: |
Contributor
Author
Hanno suggested a re-write to make the format in line with other proof styles: pq-code-package/mldsa-native#1037 (comment) which I will do |
jakemas
added a commit
to jakemas/s2n-bignum
that referenced
this pull request
May 19, 2026
Using functionaltest(arm, ...) puts the test in the 'inapplicable' bucket on x86, which the summary equation successes + skipped == tested does not count, producing 'Testing all passed but is incomplete' and a non-zero exit. Switch to functionaltest(all, ...) so the function runs everywhere; the body is already gated with #ifdef __x86_64__ to return 0 without referencing the ARM-only symbol. Mirrors the pattern used for test_mldsa_poly_use_hint_32 in PR awslabs#372.
19cde16 to
3956e68
Compare
Contributor
Author
made some changes in 3956e68 |
The poly_use_hint_32 proof exposed a non-standard _CORRECT /
_CORRECT_CODE split that doesn't appear in any other proof in
s2n-bignum (or in mlkem-native / mldsa-native).
This consolidates the proof into a single FIPS 204-aligned _CORRECT
theorem and a single _SUBROUTINE_CORRECT, matching the convention used
by the rest of the codebase. Internal intermediates _CORRECT_CODE,
_CORRECT_BOUND_CODE, and _SUBROUTINE_CORRECT_CODE are removed.
The new _CORRECT places the input-bound antecedents
(val(x i) < 8380417 /\ val(y i) <= 1) inside the postcondition
(decompose-style), so the assembly's symbolic execution doesn't depend
on input ranges. This matches the shape used by poly_decompose_32 and
poly_decompose_88 in mldsa-native.
Mechanics:
- The FIPS 204 / code-aligned equivalence section
(MLDSA_USE_HINT_32_EQUIV) moves above the correctness proof.
- The new _CORRECT runs the 878-step ARM symbolic execution without
using input bounds, then DISCH_THEN(CONJUNCTS_THEN ASSUME_TAC)
picks up the bounds from the postcondition antecedent.
- Per-element FIPS-equality is closed via a derived EC_FINAL helper
after IMP_REWRITE_TAC[MLDSA_USE_HINT_32_EQUIV].
- The val < 16 corollary is closed by reducing val(word ..) to MOD
via VAL_WORD and discharging via the bound on
mldsa_use_hint_32_code.
- _SUBROUTINE_CORRECT derives from _CORRECT via
ARM_ADD_RETURN_NOSTACK_TAC.
Verified end-to-end: the new proof closes in 25:10 wall time, exit 0,
no axioms.
Signed-off-by: Jake Massimo <jakemas@amazon.com>
3956e68 to
05e44d7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
With help, and love, from Claude.
Add formal verification proof for
mldsa_poly_use_hint_32(AArch64 NEON), establishing full functional correctness: the assembly output equalsword(mldsa_use_hint_32_spec(val(a), val(h)))for each element, matching the FIPS 204 UseHint specification for GAMMA2 = (Q-1)/32 (ML-DSA security levels 3 and 5, l=5,7).The companion
_88proof (security level 2, l=4) is in PR #375.What the proof establishes
The proof verifies that hand-optimized NEON assembly — Barrett multiplication with magic constants, SIMD bit tricks, branchless conditional masking — produces exactly the same mathematical function as the FIPS 204 spec, for every valid input (8.4M x 2 input pairs).
Proof architecture
Barrett equivalence (
BARRETT_EQUIV): The assembly computesa1as((2 * a * 1074791425) / 2^32 + 131072) / 262144. The spec computes it as((a + 127) / 128 * 1025 + 2097152) / 4194304. These are completely different formulas. The proof establishes equivalence for alla < Qvia 17-case analysis (one per output value 0..16), usingDIV_MONOto sandwich both formulas to the same constant in each output interval.Element correctness (
ELEMENT_CORRECT): Provesval(asm_result) = spec(val(a), val(h))for a single element. Splits into wrap (a1=16) and nowrap (a1<=15) cases. Uses bridge lemmas for native-modereal_gt/int_gtcompatibility, andDIV_MONO-based upper bounds for the remainder.SIMD endgame (
MLDSA_USE_HINT_CORRECT): After 1006-step ARM symbolic execution, pushesword_subwordthrough SIMD operations to extract per-element 32-bit results, then matches each against the element correctness theorem.Shared helpers:
WORD_AND_ONES_32,WORD_MUL_1_32,REAL_INT_GT_BRIDGE,REAL_INT_GT_BRIDGE_POSmoved tocommon/mlkem_mldsa.mlfor reuse by both_32and_88proofs. Unused intermediate lemmas from development removed.Changes
arm/proofs/mldsa_poly_use_hint_32.ml— full proof with functional correctness postcondition, constant-time and memory safetyarm/mldsa/mldsa_poly_use_hint_32.S— assembly sourcecommon/mlkem_mldsa.ml— shared helper lemmasTesting
Proof verified end-to-end in HOL Light (both interpreted MCP and native compilation CI). Also verified in the mldsa-native fork which runs native-mode CI. The
_32proof takes approximately 16 minutes in CI.