ML-DSA: import and enable aarch64 assembly backend from mldsa-native by jakemas · Pull Request #3219 · aws/aws-lc

jakemas · 2026-05-05T21:02:24Z

Summary

Imports the AArch64 native arithmetic backend from mldsa-native into ML-DSA, providing Neon-accelerated assembly for ten formally-verified polynomial operations.
Only pure assembly (.S) files with completed HOL-Light functional correctness proofs are imported; NTT/INTT, rej_uniform* and polyz_unpack* are intentionally excluded because they do not (yet) have AArch64 proofs in upstream mldsa-native. The C reference implementation is used on those paths.
Follows the same integration pattern as ML-DSA: import and enable x86_64 assembly backend from mldsa-native #3195 (x86_64 backend) and ML-KEM's AArch64 backend, using s2n-bignum macros for symbol visibility.

Stacked on top of #3195. Also requires aws/aws-lc-rs#1113 so the aws-lc-rs CC builder can discover the new aarch64 .S files (mirrors #1110 for the x86_64 backend).

Stacked on top of #3195. The first two commits here are from #3195 (x86_64 backend). The last two commits are the new work for AArch64 (and a refresh of the x86_64 import against the same upstream revision). Reviewers should focus on the two commits starting at db008b7.

Benchmark

Measured on AWS r8g.4xlarge (Neoverse-V2), bssl speed -timeout 3, single run (ops/sec, higher is better):

┌─────────────────────┬──────────┬──────────┬─────────────┐
│ Operation           │  Before  │  After   │ Improvement │
├─────────────────────┼──────────┼──────────┼─────────────┤
│ MLDSA44 keygen      │  28,309  │  29,184  │   +3.1%     │
│ MLDSA44 signing     │   6,863  │   7,855  │  +14.4%     │
│ MLDSA44 verify      │  27,752  │  30,601  │  +10.3%     │
│ MLDSA65 keygen      │  14,597  │  14,980  │   +2.6%     │
│ MLDSA65 signing     │   4,378  │   5,047  │  +15.3%     │
│ MLDSA65 verify      │  17,528  │  19,103  │   +9.0%     │
│ MLDSA87 keygen      │  10,939  │  11,280  │   +3.1%     │
│ MLDSA87 signing     │   3,631  │   4,134  │  +13.9%     │
│ MLDSA87 verify      │  11,102  │  12,021  │   +8.3%     │
└─────────────────────┴──────────┴──────────┴─────────────┘

Speedups are smaller than the x86_64 numbers in #3195 because NTT/INTT — which dominate keygen on the C side — are not replaced on AArch64 (no proofs yet upstream). Signing/verify still see meaningful wins from accelerated pointwise multiplication, decompose, use_hint, caddq and chknorm.

Changes

New files: 10 AArch64 assembly files (mldsa_poly_caddq_asm.S, mldsa_poly_chknorm_asm.S, mldsa_poly_decompose_{32,88}_asm.S, mldsa_poly_use_hint_{32,88}_asm.S, mldsa_pointwise_montgomery.S, mldsa_polyvecl_pointwise_acc_montgomery_l{4,5,7}.S), plus meta.h and arith_native_aarch64.h headers.
Modified: mldsa_native_backend.h — dispatches to aarch64/meta.h on OPENSSL_AARCH64.
Modified: CMakeLists.txt — adds AArch64 assembly sources to BCM build (mirrors the x86_64 block).
Modified: importer.sh — extended to import the AArch64 backend, restricted to the HOL-Light-proved routines, and refactored the s2n-bignum macro fixups into a shared helper used by both backends. Also excludes poly_caddq_avx2.S on x86_64 which upstream recently switched from a C intrinsic to pure assembly but without a proof.

Functions accelerated

All imported AArch64 functions have completed HOL-Light formal verification proofs:

poly_caddq — mldsa_poly_caddq.ml
poly_chknorm — mldsa_poly_chknorm.ml
poly_decompose (l=5,7 and l=4) — poly_decompose_{32,88}_aarch64_asm.ml
poly_use_hint (l=5,7 and l=4) — poly_use_hint_{32,88}_aarch64_asm.ml
Pointwise Montgomery multiplication — mldsa_pointwise.ml
Polyvec pointwise accumulate for L=4/5/7 — mldsa_pointwise_acc_l{4,5,7}.ml

See the mldsa-native HOL Light README for the authoritative list.

Call-outs

AArch64 NTT/INTT are not replaced (no HOL-Light proofs upstream); falls back to C.
AArch64 rej_uniform*, polyz_unpack* are not replaced (no proofs); falls back to C.
Compile-time dispatch via OPENSSL_AARCH64; no runtime CPU feature check is needed (Neon is mandatory on AArch64).
Stacked on ML-DSA: import and enable x86_64 assembly backend from mldsa-native #3195 — merge order: ML-DSA: import and enable x86_64 assembly backend from mldsa-native #3195 first, then this.
Applies a fresh import of the x86_64 backend at the same upstream SHA, so the final two commits include x86_64 refresh deltas in addition to the new AArch64 tree.

Testing

All 76 ML-DSA and PQDSA tests pass on r8g.4xlarge (KAT, Wycheproof, expanded key validation, context-string round-trips).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

Add the build infrastructure and importer script changes needed to enable the x86_64 native arithmetic backend from mldsa-native: - CMakeLists.txt: add ML-DSA x86_64 assembly sources to BCM build - mldsa_native_config.h: enable native backend with MLD_CONFIG_USE_NATIVE_BACKEND_ARITH - mldsa_native_backend.h: platform dispatcher for x86_64 - importer.sh: extend to import x86_64 backend, process assembly with s2n-bignum macros, strip C-intrinsic operations, and rename files with mldsa_ prefix to avoid basename collisions with ML-KEM

Clean output of running the importer script: GITHUB_SHA=b61e84f0c73d4ed612ffcaea4282a9d682de3f46 ./importer.sh --force This imports formally verified AVX2 assembly for: - NTT (forward and inverse) - NTT unpack (custom coefficient order) - Pointwise Montgomery multiplication - Polyvec pointwise accumulate for L=4/5/7

Add the build infrastructure and importer-script changes needed to enable the AArch64 native arithmetic backend from mldsa-native: - importer.sh: copy the AArch64 `native/aarch64/` tree; keep only the assembly files that have completed HOL-Light functional correctness proofs (poly_caddq, poly_chknorm, poly_decompose_{32,88}, poly_use_hint_{32,88}, pointwise_montgomery and polyvecl_pointwise_acc_montgomery_l{4,5,7}). Exclude NTT/INTT, rej_uniform* and polyz_unpack* on AArch64 because they are not yet formally verified. Strip their declarations and inline wrappers from meta.h / arith_native_aarch64.h. Refactor the x86_64 assembly post-processing into a shared fixup_asm_backend() helper that also handles the AArch64 header (_internal_s2n_bignum_arm.h) and the MLD_ASM_FN_SIZE directive used on that side. Also exclude poly_caddq_avx2.S on x86_64, which upstream recently converted from a C intrinsic into pure assembly but without a HOL-Light proof. - mldsa_native_backend.h: dispatch to aarch64/meta.h when OPENSSL_AARCH64 is defined, falling through to x86_64 otherwise. - CMakeLists.txt: glob mldsa/native/aarch64/src/*.S into BCM_ASM_SOURCES for aarch64 Unix builds (mirrors the x86_64 block). No generated sources change in this commit; running `./importer.sh --force` against mldsa-native produces the ML-DSA tree imported in the follow-up commit.

Clean output of running the updated importer script: GITHUB_SHA=45ba4b3e87aba0e6681f256a3e5f90e01b0e3af1 ./importer.sh --force This imports the formally verified AArch64 assembly for: - poly_caddq - poly_chknorm - poly_decompose (l=5,7 and l=4) - poly_use_hint (l=5,7 and l=4) - pointwise multiplication (Montgomery) - polyvecl_pointwise_acc (Montgomery) for L=4, 5, 7 Also refreshes the x86_64 backend from the same upstream revision. AArch64 NTT/INTT, rej_uniform* and polyz_unpack* are intentionally not imported because they do not yet have HOL-Light proofs; the C reference implementation is used on those paths.

github-actions

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 10 out of 17. Check the log or trigger a new build to see more.

github-actions · 2026-05-05T21:17:21Z

+void mld_pack_sig_h_poly(uint8_t sig[MLDSA_CRYPTO_BYTES], const mld_poly *h,
+                         unsigned int k, unsigned int n)
+{
+  unsigned int j;


warning: variable 'j' is not initialized [cppcoreguidelines-init-variables]

Suggested change

unsigned int j;

unsigned int j = 0;

github-actions · 2026-05-05T21:17:21Z

-   */
-  mld_memset(sig, 0, MLDSA_POLYVECH_PACKEDBYTES);
+   * coming from each of the K polynomials in h. */
+  uint8_t *sig_h = sig + MLDSA_CTILDEBYTES + MLDSA_L * MLDSA_POLYZ_PACKEDBYTES;


warning: variable 'sig_h' is not initialized [cppcoreguidelines-init-variables]

Suggested change

uint8_t *sig_h = sig + MLDSA_CTILDEBYTES + MLDSA_L * MLDSA_POLYZ_PACKEDBYTES;

uint8_t *sig_h = NULL = sig + MLDSA_CTILDEBYTES + MLDSA_L * MLDSA_POLYZ_PACKEDBYTES;

github-actions · 2026-05-05T21:17:21Z

 {
-  unsigned int i, j;
-  unsigned int old_hint_count;
+  const uint8_t *packed_hints =


warning: variable 'packed_hints' is not initialized [cppcoreguidelines-init-variables]

Suggested change

const uint8_t *packed_hints =

const uint8_t *packed_hints = NULL =

github-actions · 2026-05-05T21:17:21Z

-  unsigned int old_hint_count;
+  const uint8_t *packed_hints =
+      sig + MLDSA_CTILDEBYTES + MLDSA_L * MLDSA_POLYZ_PACKEDBYTES;
+  const unsigned int old_hint_count =


warning: variable 'old_hint_count' is not initialized [cppcoreguidelines-init-variables]

Suggested change

const unsigned int old_hint_count =

const unsigned int old_hint_count = 0 =

github-actions · 2026-05-05T21:17:22Z

+      sig + MLDSA_CTILDEBYTES + MLDSA_L * MLDSA_POLYZ_PACKEDBYTES;
+  const unsigned int old_hint_count =
+      (i == 0) ? 0 : packed_hints[MLDSA_OMEGA + i - 1];
+  const unsigned int new_hint_count = packed_hints[MLDSA_OMEGA + i];


warning: variable 'new_hint_count' is not initialized [cppcoreguidelines-init-variables]

Suggested change

const unsigned int new_hint_count = packed_hints[MLDSA_OMEGA + i];

const unsigned int new_hint_count = 0 = packed_hints[MLDSA_OMEGA + i];

github-actions · 2026-05-05T21:17:22Z

+  const unsigned int old_hint_count =
+      (i == 0) ? 0 : packed_hints[MLDSA_OMEGA + i - 1];
+  const unsigned int new_hint_count = packed_hints[MLDSA_OMEGA + i];
+  unsigned int j;


warning: variable 'j' is not initialized [cppcoreguidelines-init-variables]

Suggested change

unsigned int j;

unsigned int j = 0;

github-actions · 2026-05-05T21:17:22Z

+void mld_polyvec_matrix_expand_eager(mld_polymat_eager *mat,
+                                     const uint8_t rho[MLDSA_SEEDBYTES])
+{
+  unsigned int i, j;


warning: variable 'i' is not initialized [cppcoreguidelines-init-variables]

Suggested change

unsigned int i, j;

unsigned int i = 0, j;

github-actions · 2026-05-05T21:17:22Z

+void mld_polyvec_matrix_expand_eager(mld_polymat_eager *mat,
+                                     const uint8_t rho[MLDSA_SEEDBYTES])
+{
+  unsigned int i, j;


warning: variable 'j' is not initialized [cppcoreguidelines-init-variables]

Suggested change

unsigned int i, j;

unsigned int i, j = 0;

github-actions · 2026-05-05T21:17:22Z

+    decreases(MLDSA_K * MLDSA_L - i)
+  )
+  {
+    uint8_t x = (uint8_t)(i / MLDSA_L);


warning: variable 'x' is not initialized [cppcoreguidelines-init-variables]

Suggested change

uint8_t x = (uint8_t)(i / MLDSA_L);

uint8_t x = 0 = (uint8_t)(i / MLDSA_L);

github-actions · 2026-05-05T21:17:22Z

+  )
+  {
+    uint8_t x = (uint8_t)(i / MLDSA_L);
+    uint8_t y = (uint8_t)(i % MLDSA_L);


warning: variable 'y' is not initialized [cppcoreguidelines-init-variables]

Suggested change

uint8_t y = (uint8_t)(i % MLDSA_L);

uint8_t y = 0 = (uint8_t)(i % MLDSA_L);

codecov-commenter · 2026-05-05T21:53:27Z

Codecov Report

❌ Patch coverage is 92.94118% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.12%. Comparing base (c0fe8a9) to head (33b45b8).
⚠️ Report is 13 commits behind head on main.

Files with missing lines	Patch %	Lines
crypto/fipsmodule/ml_dsa/mldsa/sign.c	89.18%	12 Missing ⚠️
...rypto/fipsmodule/ml_dsa/mldsa/native/x86_64/meta.h	81.81%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3219      +/-   ##
==========================================
+ Coverage   78.06%   78.12%   +0.06%     
==========================================
  Files         689      692       +3     
  Lines      122732   123031     +299     
  Branches    17083    17114      +31     
==========================================
+ Hits        95816    96124     +308     
+ Misses      26014    26003      -11     
- Partials      902      904       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jakemas added 2 commits May 1, 2026 07:04

jakemas requested a review from a team as a code owner May 5, 2026 21:02

jakemas marked this pull request as draft May 5, 2026 21:03

jakemas had a problem deploying to auto-approve May 5, 2026 21:03 — with GitHub Actions Error

jakemas had a problem deploying to auto-approve May 5, 2026 21:03 — with GitHub Actions Failure

jakemas added 2 commits May 5, 2026 21:11

jakemas force-pushed the mldsa-native-aarch64-backend branch from bc1c592 to 33b45b8 Compare May 5, 2026 21:12

github-actions Bot reviewed May 5, 2026

View reviewed changes

jakemas temporarily deployed to auto-approve May 5, 2026 21:17 — with GitHub Actions Inactive

jakemas mentioned this pull request May 5, 2026

Add ML-DSA aarch64 native assembly to CC builder scripts aws/aws-lc-rs#1113

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ML-DSA: import and enable aarch64 assembly backend from mldsa-native#3219

ML-DSA: import and enable aarch64 assembly backend from mldsa-native#3219
jakemas wants to merge 4 commits into
aws:mainfrom
jakemas:mldsa-native-aarch64-backend

jakemas commented May 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot May 5, 2026

Uh oh!

github-actions Bot May 5, 2026

Uh oh!

github-actions Bot May 5, 2026

Uh oh!

github-actions Bot May 5, 2026

Uh oh!

github-actions Bot May 5, 2026

Uh oh!

github-actions Bot May 5, 2026

Uh oh!

github-actions Bot May 5, 2026

Uh oh!

github-actions Bot May 5, 2026

Uh oh!

github-actions Bot May 5, 2026

Uh oh!

github-actions Bot May 5, 2026

Uh oh!

codecov-commenter commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	uint8_t sig_h = sig + MLDSA_CTILDEBYTES + MLDSA_L MLDSA_POLYZ_PACKEDBYTES;
	uint8_t sig_h = NULL = sig + MLDSA_CTILDEBYTES + MLDSA_L MLDSA_POLYZ_PACKEDBYTES;

	const uint8_t *packed_hints =
	const uint8_t *packed_hints = NULL =

	const unsigned int old_hint_count =
	const unsigned int old_hint_count = 0 =

	const unsigned int new_hint_count = packed_hints[MLDSA_OMEGA + i];
	const unsigned int new_hint_count = 0 = packed_hints[MLDSA_OMEGA + i];

	uint8_t x = (uint8_t)(i / MLDSA_L);
	uint8_t x = 0 = (uint8_t)(i / MLDSA_L);

	uint8_t y = (uint8_t)(i % MLDSA_L);
	uint8_t y = 0 = (uint8_t)(i % MLDSA_L);

Conversation

jakemas commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark

Changes

Functions accelerated

Call-outs

Testing

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented May 5, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jakemas commented May 5, 2026 •

edited

Loading