ML-DSA: import and enable aarch64 assembly backend from mldsa-native#3219
ML-DSA: import and enable aarch64 assembly backend from mldsa-native#3219jakemas wants to merge 4 commits into
Conversation
Add the build infrastructure and importer script changes needed to enable the x86_64 native arithmetic backend from mldsa-native: - CMakeLists.txt: add ML-DSA x86_64 assembly sources to BCM build - mldsa_native_config.h: enable native backend with MLD_CONFIG_USE_NATIVE_BACKEND_ARITH - mldsa_native_backend.h: platform dispatcher for x86_64 - importer.sh: extend to import x86_64 backend, process assembly with s2n-bignum macros, strip C-intrinsic operations, and rename files with mldsa_ prefix to avoid basename collisions with ML-KEM
Clean output of running the importer script: GITHUB_SHA=b61e84f0c73d4ed612ffcaea4282a9d682de3f46 ./importer.sh --force This imports formally verified AVX2 assembly for: - NTT (forward and inverse) - NTT unpack (custom coefficient order) - Pointwise Montgomery multiplication - Polyvec pointwise accumulate for L=4/5/7
Add the build infrastructure and importer-script changes needed to
enable the AArch64 native arithmetic backend from mldsa-native:
- importer.sh: copy the AArch64 `native/aarch64/` tree; keep only the
assembly files that have completed HOL-Light functional correctness
proofs (poly_caddq, poly_chknorm, poly_decompose_{32,88},
poly_use_hint_{32,88}, pointwise_montgomery and
polyvecl_pointwise_acc_montgomery_l{4,5,7}). Exclude NTT/INTT,
rej_uniform* and polyz_unpack* on AArch64 because they are not yet
formally verified. Strip their declarations and inline wrappers from
meta.h / arith_native_aarch64.h. Refactor the x86_64 assembly
post-processing into a shared fixup_asm_backend() helper that also
handles the AArch64 header (_internal_s2n_bignum_arm.h) and the
MLD_ASM_FN_SIZE directive used on that side. Also exclude
poly_caddq_avx2.S on x86_64, which upstream recently converted from
a C intrinsic into pure assembly but without a HOL-Light proof.
- mldsa_native_backend.h: dispatch to aarch64/meta.h when
OPENSSL_AARCH64 is defined, falling through to x86_64 otherwise.
- CMakeLists.txt: glob mldsa/native/aarch64/src/*.S into BCM_ASM_SOURCES
for aarch64 Unix builds (mirrors the x86_64 block).
No generated sources change in this commit; running
`./importer.sh --force` against mldsa-native produces the ML-DSA tree
imported in the follow-up commit.
Clean output of running the updated importer script: GITHUB_SHA=45ba4b3e87aba0e6681f256a3e5f90e01b0e3af1 ./importer.sh --force This imports the formally verified AArch64 assembly for: - poly_caddq - poly_chknorm - poly_decompose (l=5,7 and l=4) - poly_use_hint (l=5,7 and l=4) - pointwise multiplication (Montgomery) - polyvecl_pointwise_acc (Montgomery) for L=4, 5, 7 Also refreshes the x86_64 backend from the same upstream revision. AArch64 NTT/INTT, rej_uniform* and polyz_unpack* are intentionally not imported because they do not yet have HOL-Light proofs; the C reference implementation is used on those paths.
bc1c592 to
33b45b8
Compare
| void mld_pack_sig_h_poly(uint8_t sig[MLDSA_CRYPTO_BYTES], const mld_poly *h, | ||
| unsigned int k, unsigned int n) | ||
| { | ||
| unsigned int j; |
There was a problem hiding this comment.
warning: variable 'j' is not initialized [cppcoreguidelines-init-variables]
| unsigned int j; | |
| unsigned int j = 0; |
| */ | ||
| mld_memset(sig, 0, MLDSA_POLYVECH_PACKEDBYTES); | ||
| * coming from each of the K polynomials in h. */ | ||
| uint8_t *sig_h = sig + MLDSA_CTILDEBYTES + MLDSA_L * MLDSA_POLYZ_PACKEDBYTES; |
There was a problem hiding this comment.
warning: variable 'sig_h' is not initialized [cppcoreguidelines-init-variables]
| uint8_t *sig_h = sig + MLDSA_CTILDEBYTES + MLDSA_L * MLDSA_POLYZ_PACKEDBYTES; | |
| uint8_t *sig_h = NULL = sig + MLDSA_CTILDEBYTES + MLDSA_L * MLDSA_POLYZ_PACKEDBYTES; |
| { | ||
| unsigned int i, j; | ||
| unsigned int old_hint_count; | ||
| const uint8_t *packed_hints = |
There was a problem hiding this comment.
warning: variable 'packed_hints' is not initialized [cppcoreguidelines-init-variables]
| const uint8_t *packed_hints = | |
| const uint8_t *packed_hints = NULL = |
| unsigned int old_hint_count; | ||
| const uint8_t *packed_hints = | ||
| sig + MLDSA_CTILDEBYTES + MLDSA_L * MLDSA_POLYZ_PACKEDBYTES; | ||
| const unsigned int old_hint_count = |
There was a problem hiding this comment.
warning: variable 'old_hint_count' is not initialized [cppcoreguidelines-init-variables]
| const unsigned int old_hint_count = | |
| const unsigned int old_hint_count = 0 = |
| sig + MLDSA_CTILDEBYTES + MLDSA_L * MLDSA_POLYZ_PACKEDBYTES; | ||
| const unsigned int old_hint_count = | ||
| (i == 0) ? 0 : packed_hints[MLDSA_OMEGA + i - 1]; | ||
| const unsigned int new_hint_count = packed_hints[MLDSA_OMEGA + i]; |
There was a problem hiding this comment.
warning: variable 'new_hint_count' is not initialized [cppcoreguidelines-init-variables]
| const unsigned int new_hint_count = packed_hints[MLDSA_OMEGA + i]; | |
| const unsigned int new_hint_count = 0 = packed_hints[MLDSA_OMEGA + i]; |
| const unsigned int old_hint_count = | ||
| (i == 0) ? 0 : packed_hints[MLDSA_OMEGA + i - 1]; | ||
| const unsigned int new_hint_count = packed_hints[MLDSA_OMEGA + i]; | ||
| unsigned int j; |
There was a problem hiding this comment.
warning: variable 'j' is not initialized [cppcoreguidelines-init-variables]
| unsigned int j; | |
| unsigned int j = 0; |
| void mld_polyvec_matrix_expand_eager(mld_polymat_eager *mat, | ||
| const uint8_t rho[MLDSA_SEEDBYTES]) | ||
| { | ||
| unsigned int i, j; |
There was a problem hiding this comment.
warning: variable 'i' is not initialized [cppcoreguidelines-init-variables]
| unsigned int i, j; | |
| unsigned int i = 0, j; |
| void mld_polyvec_matrix_expand_eager(mld_polymat_eager *mat, | ||
| const uint8_t rho[MLDSA_SEEDBYTES]) | ||
| { | ||
| unsigned int i, j; |
There was a problem hiding this comment.
warning: variable 'j' is not initialized [cppcoreguidelines-init-variables]
| unsigned int i, j; | |
| unsigned int i, j = 0; |
| decreases(MLDSA_K * MLDSA_L - i) | ||
| ) | ||
| { | ||
| uint8_t x = (uint8_t)(i / MLDSA_L); |
There was a problem hiding this comment.
warning: variable 'x' is not initialized [cppcoreguidelines-init-variables]
| uint8_t x = (uint8_t)(i / MLDSA_L); | |
| uint8_t x = 0 = (uint8_t)(i / MLDSA_L); |
| ) | ||
| { | ||
| uint8_t x = (uint8_t)(i / MLDSA_L); | ||
| uint8_t y = (uint8_t)(i % MLDSA_L); |
There was a problem hiding this comment.
warning: variable 'y' is not initialized [cppcoreguidelines-init-variables]
| uint8_t y = (uint8_t)(i % MLDSA_L); | |
| uint8_t y = 0 = (uint8_t)(i % MLDSA_L); |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3219 +/- ##
==========================================
+ Coverage 78.06% 78.12% +0.06%
==========================================
Files 689 692 +3
Lines 122732 123031 +299
Branches 17083 17114 +31
==========================================
+ Hits 95816 96124 +308
+ Misses 26014 26003 -11
- Partials 902 904 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
.S) files with completed HOL-Light functional correctness proofs are imported; NTT/INTT, rej_uniform* and polyz_unpack* are intentionally excluded because they do not (yet) have AArch64 proofs in upstream mldsa-native. The C reference implementation is used on those paths.Benchmark
Measured on AWS
r8g.4xlarge(Neoverse-V2),bssl speed -timeout 3, single run (ops/sec, higher is better):Speedups are smaller than the x86_64 numbers in #3195 because NTT/INTT — which dominate keygen on the C side — are not replaced on AArch64 (no proofs yet upstream). Signing/verify still see meaningful wins from accelerated pointwise multiplication, decompose, use_hint, caddq and chknorm.
Changes
mldsa_poly_caddq_asm.S,mldsa_poly_chknorm_asm.S,mldsa_poly_decompose_{32,88}_asm.S,mldsa_poly_use_hint_{32,88}_asm.S,mldsa_pointwise_montgomery.S,mldsa_polyvecl_pointwise_acc_montgomery_l{4,5,7}.S), plusmeta.handarith_native_aarch64.hheaders.mldsa_native_backend.h— dispatches to aarch64/meta.h onOPENSSL_AARCH64.CMakeLists.txt— adds AArch64 assembly sources to BCM build (mirrors the x86_64 block).importer.sh— extended to import the AArch64 backend, restricted to the HOL-Light-proved routines, and refactored the s2n-bignum macro fixups into a shared helper used by both backends. Also excludespoly_caddq_avx2.Son x86_64 which upstream recently switched from a C intrinsic to pure assembly but without a proof.Functions accelerated
All imported AArch64 functions have completed HOL-Light formal verification proofs:
mldsa_poly_caddq.mlmldsa_poly_chknorm.mlpoly_decompose_{32,88}_aarch64_asm.mlpoly_use_hint_{32,88}_aarch64_asm.mlmldsa_pointwise.mlmldsa_pointwise_acc_l{4,5,7}.mlSee the mldsa-native HOL Light README for the authoritative list.
Call-outs
OPENSSL_AARCH64; no runtime CPU feature check is needed (Neon is mandatory on AArch64).Testing
r8g.4xlarge(KAT, Wycheproof, expanded key validation, context-string round-trips).By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.