ML-DSA: import and enable x86_64 assembly backend from mldsa-native by jakemas · Pull Request #3195 · aws/aws-lc

jakemas · 2026-04-27T21:43:51Z

Summary

Imports the x86_64 native arithmetic backend from mldsa-native into ML-DSA, providing AVX2-accelerated assembly for core polynomial operations
Only pure assembly (.S) files with completed HOL-Light formal verification proofs are imported; C intrinsics and unverified assembly are excluded
Follows the same integration pattern as ML-KEM's x86_64 backend (PR ML-KEM: import and enable x86_64 backend from mlkem-native #2631 / commit 3b1e95e), using s2n-bignum macros for symbol visibility

Benchmark

Measured on Intel Xeon Platinum 8175M @ 2.50GHz (EC2 c5.2xlarge), average of 4 runs (ops/sec, higher is better). "Before" is bssl speed -filter MLDSA built from main; "After" is the same with this PR applied:

┌──────────────────────┬──────────┬──────────┬─────────────┐
│ Operation            │  Before  │  After   │ Improvement │
├──────────────────────┼──────────┼──────────┼─────────────┤
│ MLDSA44 keygen       │  14,472  │  18,151  │   +25.4%    │
│ MLDSA44 signing      │   2,343  │   4,794  │  +104.6%    │
│ MLDSA44 verify       │  11,537  │  19,064  │   +65.2%    │
│ MLDSA65 keygen       │   6,607  │   8,599  │   +30.2%    │
│ MLDSA65 signing      │   1,489  │   3,265  │  +119.3%    │
│ MLDSA65 verify       │   8,302  │  10,894  │   +31.2%    │
│ MLDSA87 keygen       │   5,599  │   6,932  │   +23.8%    │
│ MLDSA87 signing      │   1,310  │   2,862  │  +118.5%    │
│ MLDSA87 verify       │   5,705  │   7,723  │   +35.4%    │
└──────────────────────┴──────────┴──────────┴─────────────┘

Changes

Commit 1 — Manual/preparatory changes:

CMakeLists.txt — adds ML-DSA x86_64 assembly sources to BCM build via file(GLOB ... CONFIGURE_DEPENDS "*.S")
mldsa_native_config.h — enables native backend with MLD_CONFIG_USE_NATIVE_BACKEND_ARITH
mldsa_native_backend.h — platform dispatcher (x86_64 + AVX2 only for now)
mldsa_x86_64_meta.h — hand-maintained backend header outside the imported mldsa/ tree, declaring only the assembly-backed operations we use
importer.sh — imports x86_64 backend and processes assembly with s2n-bignum macros

Commit 2 — Clean import output:

Solely the output of GITHUB_SHA=1b47ba602b3220fb06380840fd516dde4243122e ./importer.sh --force
Contains no manual changes — reproducible by running the above command

Functions accelerated

All imported functions have completed HOL-Light formal verification proofs:

NTT (forward) — mldsa_ntt.ml
INTT (inverse) — mldsa_intt.ml
NTT unpack (custom coefficient order) — mldsa_nttunpack.ml
Pointwise Montgomery multiplication — mldsa_pointwise.ml
Polyvec pointwise accumulate for L=4/5/7 — mldsa_pointwise_acc_l{4,5,7}.ml
Conditional add Q (caddq) — mldsa_caddq.ml

Importer simplifications (addressing review feedback)

The importer is now significantly simpler compared to the previous revision:

Removed MLD_INTERNAL_API sed — upstream now uses MLD_INTERNAL_DATA_DECLARATION/DEFINITION
Removed ELF section move — upstream already places it correctly
Removed file renaming — upstream now uses unique _avx2_asm.S names (no collision with mlkem)
Glob copy for all .S files — no need to enumerate each assembly file individually
Single sed pattern to strip C-intrinsic includes from BCM (keeps only consts.c)
Simple substitution for MLD_ASM_FN_SIZE → S2N_BN_SIZE_DIRECTIVE (upstream now defines it)

Call-outs

Supersedes Import ML-DSA x86_64 NTT/INTT assembly optimizations #2986 which only imported NTT/INTT — this PR imports all formally verified assembly-backed operations
C intrinsics (rej_uniform, decompose, use_hint, chknorm, polyz_unpack) are intentionally excluded
Runtime dispatch via CRYPTO_is_AVX2_capable() — falls back to C on non-AVX2 systems
AArch64 native backend is not included in this PR

Testing

All ML-DSA tests pass (KAT, Wycheproof, expanded key validation, PQDSA parameter tests)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

jakemas · 2026-04-27T22:05:00Z

Note on aws-lc-rs CI failure: The aws-lc-rs-linux CI failure is expected. The aws-lc-rs CC builder scripts (aws-lc-sys/scripts/cc_builder/linux_x86_64.sh, apple_x86_64.sh) currently only discover ML-KEM native assembly via find crypto/fipsmodule/ml_kem/mlkem/native/x86_64/src -name "*.S". A follow-up PR to aws-lc-rs will be needed to add the equivalent line for ML-DSA:

mapfile -O ${#SOURCE_FILES[@]} -t SOURCE_FILES < <(find crypto/fipsmodule/ml_dsa/mldsa/native/x86_64/src -name "*.S" -type f | sort -f)

This is the same pattern as when ML-KEM's x86_64 backend was first added.

codecov-commenter · 2026-04-27T22:33:09Z

Codecov Report

❌ Patch coverage is 92.77978% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.16%. Comparing base (6f246af) to head (7931498).

Files with missing lines	Patch %	Lines
crypto/fipsmodule/ml_dsa/mldsa/sign.c	89.25%	13 Missing ⚠️
crypto/fipsmodule/ml_dsa/mldsa_x86_64_meta.h	81.57%	7 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3195   +/-   ##
=======================================
  Coverage   78.15%   78.16%           
=======================================
  Files         689      692    +3     
  Lines      123678   123737   +59     
  Branches    17192    17196    +4     
=======================================
+ Hits        96663    96717   +54     
- Misses      26097    26100    +3     
- Partials      918      920    +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

WillChilds-Klein

Is the second commit solely the result of running the import script? i.e. does it contain any manual changes?

jakemas · 2026-04-29T17:12:34Z

Good question. The previous structure had manual edits mixed into the import commit. I've restructured into two clean commits:

c085723e — Manual/preparatory changes: CMakeLists.txt, mldsa_native_config.h, mldsa_native_backend.h, and importer.sh (extended to import x86_64 backend, process assembly with s2n-bignum macros, strip C-intrinsic operations, and rename files with mldsa_ prefix).
3f03ce6f — Clean import output: Solely the result of running GITHUB_SHA=b61e84f0c73d4ed612ffcaea4282a9d682de3f46 ./importer.sh --force. Contains no manual changes — anyone can reproduce the same output by running the script.

The importer now handles stripping C-intrinsic content from meta.h and arith_native_x86_64.h, adding MLD_INTERNAL_API to consts.c/consts.h, removing AArch64 references from the BCM file, and appending S2N_BN_SIZE_DIRECTIVE to assembly files (upstream mldsa-native doesn't define MLD_ASM_FN_SIZE unlike mlkem-native's MLK_ASM_FN_SIZE).

hanno-becker · 2026-05-14T03:48:10Z

+if [[ "$(uname)" == "Darwin" ]]; then
+  SED_I=(-i "")
+else
+  SED_I=(-i)
+fi
+


Unnecessary code movement

Fixed — moved SED_I back to its original position (after the unifdef/BCM copy).

Add CMake support to compile mldsa-native x86_64 assembly files, a custom mldsa_x86_64_meta.h declaring only the assembly-backed native operations (NTT, INTT, nttunpack, pointwise, polyvecl_pointwise_acc), and the importer script to pull them from upstream.

Output of: GITHUB_SHA=1b47ba602b3220fb06380840fd516dde4243122e ./importer.sh --force No manual changes — reproducible by running the above command.

jakemas · 2026-05-15T21:09:36Z

@hanno-becker anything outstanding here?

hanno-becker · 2026-05-21T03:47:57Z

+mkdir -p $SRC/native/x86_64/src
+# Backend API and specification assumed by mldsa-native frontend
+cp $TMP/mldsa/src/native/api.h $SRC/native
+# Backend header -- unused C-intrinsic declarations are harmless and left intact


I've seen old compilers (specifically GCC 4.8) complain about unresolved external function references even in static inline functions that were never used. I believe this could still happen here if we don't remove the inline definitions for the intrinsics backend. But your CI does not seem to flag it, so up to you if this risk is tolerable.

hanno-becker · 2026-05-21T06:16:44Z

+echo "Fixup x86_64 assembly backend to use s2n-bignum macros"
+for file in $SRC/native/x86_64/src/*.S; do
+  echo "Processing $file"
+  tmp_file=$(mktemp)
+
+  backend_define="MLD_ARITH_BACKEND_X86_64_DEFAULT"
+
+  # Flatten multiline preprocessor directives, then process with unifdef
+  sed -e ':a' -e 'N' -e '$!ba' -e 's/\\\n/ /g' "$file" | \
+    unifdef -D$backend_define -UMLD_CONFIG_MULTILEVEL_NO_SHARED -DMLD_CONFIG_MULTILEVEL_WITH_SHARED > "$tmp_file"
+  mv "$tmp_file" "$file"
+
+  # Replace common.h include and assembly macros
+  s2n_header="_internal_s2n_bignum_x86_att.h"
+  sed "${SED_I[@]}" "s/#include \"\.\.\/\.\.\/\.\.\/common\.h\"/#include \"$s2n_header\"/" "$file"
+
+  func_name=$(grep -o '\.global MLD_ASM_NAMESPACE(\([^)]*\))' "$file" | sed 's/\.global MLD_ASM_NAMESPACE(\([^)]*\))/\1/')
+  if [ -n "$func_name" ]; then
+    sed "${SED_I[@]}" "s/\.global MLD_ASM_NAMESPACE($func_name)/        S2N_BN_SYM_VISIBILITY_DIRECTIVE(mldsa_$func_name)\n        S2N_BN_SYM_PRIVACY_DIRECTIVE(mldsa_$func_name)/" "$file"
+    sed "${SED_I[@]}" "s/MLD_ASM_FN_SYMBOL($func_name)/S2N_BN_SYMBOL(mldsa_$func_name):/" "$file"
+    sed "${SED_I[@]}" "s/MLD_ASM_FN_SIZE($func_name)/S2N_BN_SIZE_DIRECTIVE(mldsa_$func_name)/" "$file"
+  fi
+done


We should avoid this complexity if we can. I'll do some experiments on the x-native side to see if we can get rid of it.

It's more difficult than I thought. Let's stick with this for now.

hanno-becker · 2026-05-21T14:30:51Z

+# Only consts.c (shared with the assembly backend) needs to be compiled.
+echo "Strip C-intrinsic includes from mldsa_native_bcm.c"
+BCM=$SRC/mldsa_native_bcm.c
+sed "${SED_I[@]}" '/^#include "native\/x86_64\/src\/poly_caddq_avx2\.c"/d'      "$BCM"


This file does not exist anymore upstream I think.

Correct, it's now poly_caddq_avx2_asm.S (assembly with a verified proof). Removed this sed line and imported the assembly file directly in 3f72ba0.

hanno-becker

I have reviewed the importer and it looks mostly good, except for one sed applying to a file that doesn't exist.

I'm not sure though if this is the right time to import: We're excluding a single *.S file from the backend (caddq) for which we don't yet have the proof upstream, and we're importing a non-tagged commit.

I'd prefer for the PR to wait until caddq is proved upstream as well, and we have a tagged release.

jakemas · 2026-05-21T16:38:43Z

I have reviewed the importer and it looks mostly good, except for one sed applying to a file that doesn't exist.

I'm not sure though if this is the right time to import: We're excluding a single *.S file from the backend (caddq) for which we don't yet have the proof upstream, and we're importing a non-tagged commit.

I'd prefer for the PR to wait until caddq is proved upstream as well, and we have a tagged release.

I agree, unfortunately, we are actively in the process of converting the AVX2 from c to S, there is another function nearing completing in PR now. I opened pq-code-package/mldsa-native#1068 to try to make the import logic even cleaner, if we can merge that upstream, version mldsa-native, not merge any more AVX2 conversions while we import, then yes, I am good with waiting to include cadddq.

Upstream mldsa-native now provides poly_caddq as verified assembly (poly_caddq_avx2_asm.S) rather than C intrinsics. Import it alongside the other proven assembly operations, add the MLD_USE_NATIVE_POLY_CADDQ declaration to our custom meta header, and drop the sed that stripped the old C-intrinsic include from the BCM.

Replace individual cp lines for each .S file with a single glob, and collapse the per-file sed deletions into one pattern that removes all x86_64 C-intrinsic .c includes except consts.c.

jakemas · 2026-05-23T02:22:43Z

+cp $TMP/mldsa/src/native/x86_64/src/arith_native_x86_64.h $SRC/native/x86_64/src
+cp $TMP/mldsa/src/native/x86_64/src/consts.h $SRC/native/x86_64/src
+cp $TMP/mldsa/src/native/x86_64/src/consts.c $SRC/native/x86_64/src
+# NOTE: all imported .S files must have verified proofs in s2n-bignum.


If new .S functions are converted upstream without a hol-light proof at the same time, this import script will pick them up. This could be awkward if we run imports before the proofs land for new functions. We could add a manual check, but its clunky. anything we add will be temporary, as over time we will add all .S functions, as all will have proofs.

Upstream I'm writing the proof at the same time as the conversion, so this won't be an issue for upcoming uniform_rej for example.

jakemas requested a review from a team as a code owner April 27, 2026 21:43

jakemas had a problem deploying to auto-approve April 27, 2026 21:44 — with GitHub Actions Failure

jakemas had a problem deploying to auto-approve April 27, 2026 21:44 — with GitHub Actions Error

jakemas force-pushed the mldsa-native-x86-backend branch from 78c4678 to fd05b29 Compare April 27, 2026 21:53

jakemas temporarily deployed to auto-approve April 27, 2026 21:54 — with GitHub Actions Inactive

jakemas temporarily deployed to auto-approve April 27, 2026 22:34 — with GitHub Actions Inactive

This was referenced Apr 28, 2026

Import ML-DSA x86_64 NTT/INTT assembly optimizations #2986

Closed

Add ML-DSA x86_64 native assembly to CC builder scripts aws/aws-lc-rs#1110

Merged

WillChilds-Klein reviewed Apr 28, 2026

View reviewed changes

geedo0 previously approved these changes Apr 29, 2026

View reviewed changes

jakemas dismissed geedo0’s stale review via a3215fa April 29, 2026 17:12

jakemas force-pushed the mldsa-native-x86-backend branch from a34e245 to a3215fa Compare April 29, 2026 17:12

jakemas force-pushed the mldsa-native-x86-backend branch from a3215fa to 8ab7c5b Compare April 29, 2026 17:13

This comment was marked as spam.

Sign in to view

jakemas force-pushed the mldsa-native-x86-backend branch 2 times, most recently from e496204 to a47dd87 Compare May 14, 2026 03:29

jakemas temporarily deployed to auto-approve May 14, 2026 03:30 — with GitHub Actions Inactive

jakemas had a problem deploying to auto-approve May 14, 2026 03:30 — with GitHub Actions Error

jakemas temporarily deployed to auto-approve May 14, 2026 03:30 — with GitHub Actions Inactive

This comment was marked as spam.

Sign in to view

hanno-becker reviewed May 14, 2026

View reviewed changes

jakemas force-pushed the mldsa-native-x86-backend branch from a47dd87 to f8ef9a4 Compare May 14, 2026 03:50

jakemas had a problem deploying to auto-approve May 14, 2026 03:51 — with GitHub Actions Error

jakemas added 2 commits May 14, 2026 03:52

ML-DSA: import x86_64 assembly backend from mldsa-native

1c4f4d2

Output of: GITHUB_SHA=1b47ba602b3220fb06380840fd516dde4243122e ./importer.sh --force No manual changes — reproducible by running the above command.

jakemas added 2 commits May 18, 2026 15:02

Merge branch 'main' into mldsa-native-x86-backend

8ff3089

Merge branch 'main' into mldsa-native-x86-backend

f304e50

WillChilds-Klein previously approved these changes May 19, 2026

View reviewed changes

Comment thread crypto/fipsmodule/ml_dsa/mldsa/.clang-format

jakemas added 2 commits May 19, 2026 20:20

Merge branch 'main' into mldsa-native-x86-backend

4596c20

Merge branch 'main' into mldsa-native-x86-backend

0ed5f6e

hanno-becker reviewed May 21, 2026

View reviewed changes

Ubuntu added 2 commits May 22, 2026 17:39

ML-DSA: simplify importer x86_64 backend copy and BCM stripping

0de2fe7

Replace individual cp lines for each .S file with a single glob, and collapse the per-file sed deletions into one pattern that removes all x86_64 C-intrinsic .c includes except consts.c.

jakemas commented May 23, 2026

View reviewed changes

Merge branch 'main' into mldsa-native-x86-backend

7931498

Conversation

jakemas commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark

Changes

Functions accelerated

Importer simplifications (addressing review feedback)

Call-outs

Testing

Uh oh!

jakemas commented Apr 27, 2026

Uh oh!

codecov-commenter commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

WillChilds-Klein left a comment

Choose a reason for hiding this comment

Uh oh!

jakemas commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as spam.

Uh oh!

This comment was marked as spam.

Uh oh!

hanno-becker May 14, 2026

Choose a reason for hiding this comment

Uh oh!

jakemas May 14, 2026

Choose a reason for hiding this comment

Uh oh!

jakemas commented May 15, 2026

Uh oh!

Uh oh!

hanno-becker May 21, 2026

Choose a reason for hiding this comment

Uh oh!

hanno-becker May 21, 2026

Choose a reason for hiding this comment

Uh oh!

hanno-becker May 21, 2026

Choose a reason for hiding this comment

Uh oh!

hanno-becker May 21, 2026

Choose a reason for hiding this comment

Uh oh!

jakemas May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanno-becker left a comment

Choose a reason for hiding this comment

Uh oh!

jakemas commented May 21, 2026

Uh oh!

jakemas May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jakemas commented Apr 27, 2026 •

edited

Loading

codecov-commenter commented Apr 27, 2026 •

edited

Loading

jakemas commented Apr 29, 2026 •

edited

Loading

jakemas May 22, 2026 •

edited

Loading

jakemas May 23, 2026 •

edited

Loading