Skip to content

ML-DSA: import and enable x86_64 assembly backend from mldsa-native#3195

Open
jakemas wants to merge 9 commits into
aws:mainfrom
jakemas:mldsa-native-x86-backend
Open

ML-DSA: import and enable x86_64 assembly backend from mldsa-native#3195
jakemas wants to merge 9 commits into
aws:mainfrom
jakemas:mldsa-native-x86-backend

Conversation

@jakemas
Copy link
Copy Markdown
Contributor

@jakemas jakemas commented Apr 27, 2026

Summary

  • Imports the x86_64 native arithmetic backend from mldsa-native into ML-DSA, providing AVX2-accelerated assembly for core polynomial operations
  • Only pure assembly (.S) files with completed HOL-Light formal verification proofs are imported; C intrinsics and unverified assembly are excluded
  • Follows the same integration pattern as ML-KEM's x86_64 backend (PR ML-KEM: import and enable x86_64 backend from mlkem-native #2631 / commit 3b1e95e), using s2n-bignum macros for symbol visibility

Benchmark

Measured on Intel Xeon Platinum 8175M @ 2.50GHz (EC2 c5.2xlarge), average of 4 runs (ops/sec, higher is better). "Before" is bssl speed -filter MLDSA built from main; "After" is the same with this PR applied:

┌──────────────────────┬──────────┬──────────┬─────────────┐
│ Operation            │  Before  │  After   │ Improvement │
├──────────────────────┼──────────┼──────────┼─────────────┤
│ MLDSA44 keygen       │  14,472  │  18,151  │   +25.4%    │
│ MLDSA44 signing      │   2,343  │   4,794  │  +104.6%    │
│ MLDSA44 verify       │  11,537  │  19,064  │   +65.2%    │
│ MLDSA65 keygen       │   6,607  │   8,599  │   +30.2%    │
│ MLDSA65 signing      │   1,489  │   3,265  │  +119.3%    │
│ MLDSA65 verify       │   8,302  │  10,894  │   +31.2%    │
│ MLDSA87 keygen       │   5,599  │   6,932  │   +23.8%    │
│ MLDSA87 signing      │   1,310  │   2,862  │  +118.5%    │
│ MLDSA87 verify       │   5,705  │   7,723  │   +35.4%    │
└──────────────────────┴──────────┴──────────┴─────────────┘

Changes

Commit 1 — Manual/preparatory changes:

  • CMakeLists.txt — adds ML-DSA x86_64 assembly sources to BCM build via file(GLOB ... CONFIGURE_DEPENDS "*.S")
  • mldsa_native_config.h — enables native backend with MLD_CONFIG_USE_NATIVE_BACKEND_ARITH
  • mldsa_native_backend.h — platform dispatcher (x86_64 + AVX2 only for now)
  • mldsa_x86_64_meta.h — hand-maintained backend header outside the imported mldsa/ tree, declaring only the assembly-backed operations we use
  • importer.sh — imports x86_64 backend and processes assembly with s2n-bignum macros

Commit 2 — Clean import output:

  • Solely the output of GITHUB_SHA=1b47ba602b3220fb06380840fd516dde4243122e ./importer.sh --force
  • Contains no manual changes — reproducible by running the above command

Functions accelerated

All imported functions have completed HOL-Light formal verification proofs:

  • NTT (forward) — mldsa_ntt.ml
  • INTT (inverse) — mldsa_intt.ml
  • NTT unpack (custom coefficient order) — mldsa_nttunpack.ml
  • Pointwise Montgomery multiplication — mldsa_pointwise.ml
  • Polyvec pointwise accumulate for L=4/5/7 — mldsa_pointwise_acc_l{4,5,7}.ml
  • Conditional add Q (caddq) — mldsa_caddq.ml

Importer simplifications (addressing review feedback)

The importer is now significantly simpler compared to the previous revision:

  • Removed MLD_INTERNAL_API sed — upstream now uses MLD_INTERNAL_DATA_DECLARATION/DEFINITION
  • Removed ELF section move — upstream already places it correctly
  • Removed file renaming — upstream now uses unique _avx2_asm.S names (no collision with mlkem)
  • Glob copy for all .S files — no need to enumerate each assembly file individually
  • Single sed pattern to strip C-intrinsic includes from BCM (keeps only consts.c)
  • Simple substitution for MLD_ASM_FN_SIZES2N_BN_SIZE_DIRECTIVE (upstream now defines it)

Call-outs

  • Supersedes Import ML-DSA x86_64 NTT/INTT assembly optimizations #2986 which only imported NTT/INTT — this PR imports all formally verified assembly-backed operations
  • C intrinsics (rej_uniform, decompose, use_hint, chknorm, polyz_unpack) are intentionally excluded
  • Runtime dispatch via CRYPTO_is_AVX2_capable() — falls back to C on non-AVX2 systems
  • AArch64 native backend is not included in this PR

Testing

  • All ML-DSA tests pass (KAT, Wycheproof, expanded key validation, PQDSA parameter tests)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

@jakemas
Copy link
Copy Markdown
Contributor Author

jakemas commented Apr 27, 2026

Note on aws-lc-rs CI failure: The aws-lc-rs-linux CI failure is expected. The aws-lc-rs CC builder scripts (aws-lc-sys/scripts/cc_builder/linux_x86_64.sh, apple_x86_64.sh) currently only discover ML-KEM native assembly via find crypto/fipsmodule/ml_kem/mlkem/native/x86_64/src -name "*.S". A follow-up PR to aws-lc-rs will be needed to add the equivalent line for ML-DSA:

mapfile -O ${#SOURCE_FILES[@]} -t SOURCE_FILES < <(find crypto/fipsmodule/ml_dsa/mldsa/native/x86_64/src -name "*.S" -type f | sort -f)

This is the same pattern as when ML-KEM's x86_64 backend was first added.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 27, 2026

Codecov Report

❌ Patch coverage is 92.77978% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.16%. Comparing base (6f246af) to head (7931498).

Files with missing lines Patch % Lines
crypto/fipsmodule/ml_dsa/mldsa/sign.c 89.25% 13 Missing ⚠️
crypto/fipsmodule/ml_dsa/mldsa_x86_64_meta.h 81.57% 7 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3195   +/-   ##
=======================================
  Coverage   78.15%   78.16%           
=======================================
  Files         689      692    +3     
  Lines      123678   123737   +59     
  Branches    17192    17196    +4     
=======================================
+ Hits        96663    96717   +54     
- Misses      26097    26100    +3     
- Partials      918      920    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@WillChilds-Klein WillChilds-Klein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the second commit solely the result of running the import script? i.e. does it contain any manual changes?

geedo0
geedo0 previously approved these changes Apr 29, 2026
@jakemas
Copy link
Copy Markdown
Contributor Author

jakemas commented Apr 29, 2026

Good question. The previous structure had manual edits mixed into the import commit. I've restructured into two clean commits:

  1. c085723e — Manual/preparatory changes: CMakeLists.txt, mldsa_native_config.h, mldsa_native_backend.h, and importer.sh (extended to import x86_64 backend, process assembly with s2n-bignum macros, strip C-intrinsic operations, and rename files with mldsa_ prefix).

  2. 3f03ce6f — Clean import output: Solely the result of running GITHUB_SHA=b61e84f0c73d4ed612ffcaea4282a9d682de3f46 ./importer.sh --force. Contains no manual changes — anyone can reproduce the same output by running the script.

The importer now handles stripping C-intrinsic content from meta.h and arith_native_x86_64.h, adding MLD_INTERNAL_API to consts.c/consts.h, removing AArch64 references from the BCM file, and appending S2N_BN_SIZE_DIRECTIVE to assembly files (upstream mldsa-native doesn't define MLD_ASM_FN_SIZE unlike mlkem-native's MLK_ASM_FN_SIZE).

@jakemas jakemas force-pushed the mldsa-native-x86-backend branch from a3215fa to 8ab7c5b Compare April 29, 2026 17:13
github-actions[bot]

This comment was marked as spam.

@jakemas jakemas force-pushed the mldsa-native-x86-backend branch 2 times, most recently from e496204 to a47dd87 Compare May 14, 2026 03:29
github-actions[bot]

This comment was marked as spam.

Comment thread crypto/fipsmodule/ml_dsa/importer.sh Outdated
Comment on lines +115 to +120
if [[ "$(uname)" == "Darwin" ]]; then
SED_I=(-i "")
else
SED_I=(-i)
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary code movement

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — moved SED_I back to its original position (after the unifdef/BCM copy).

@jakemas jakemas force-pushed the mldsa-native-x86-backend branch from a47dd87 to f8ef9a4 Compare May 14, 2026 03:50
jakemas added 2 commits May 14, 2026 03:52
Add CMake support to compile mldsa-native x86_64 assembly files,
a custom mldsa_x86_64_meta.h declaring only the assembly-backed
native operations (NTT, INTT, nttunpack, pointwise,
polyvecl_pointwise_acc), and the importer script to pull them
from upstream.
Output of: GITHUB_SHA=1b47ba602b3220fb06380840fd516dde4243122e ./importer.sh --force

No manual changes — reproducible by running the above command.
@jakemas
Copy link
Copy Markdown
Contributor Author

jakemas commented May 15, 2026

@hanno-becker anything outstanding here?

Comment thread crypto/fipsmodule/ml_dsa/mldsa/.clang-format
Comment thread crypto/fipsmodule/ml_dsa/importer.sh Outdated
mkdir -p $SRC/native/x86_64/src
# Backend API and specification assumed by mldsa-native frontend
cp $TMP/mldsa/src/native/api.h $SRC/native
# Backend header -- unused C-intrinsic declarations are harmless and left intact
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen old compilers (specifically GCC 4.8) complain about unresolved external function references even in static inline functions that were never used. I believe this could still happen here if we don't remove the inline definitions for the intrinsics backend. But your CI does not seem to flag it, so up to you if this risk is tolerable.

Comment on lines +169 to +191
echo "Fixup x86_64 assembly backend to use s2n-bignum macros"
for file in $SRC/native/x86_64/src/*.S; do
echo "Processing $file"
tmp_file=$(mktemp)

backend_define="MLD_ARITH_BACKEND_X86_64_DEFAULT"

# Flatten multiline preprocessor directives, then process with unifdef
sed -e ':a' -e 'N' -e '$!ba' -e 's/\\\n/ /g' "$file" | \
unifdef -D$backend_define -UMLD_CONFIG_MULTILEVEL_NO_SHARED -DMLD_CONFIG_MULTILEVEL_WITH_SHARED > "$tmp_file"
mv "$tmp_file" "$file"

# Replace common.h include and assembly macros
s2n_header="_internal_s2n_bignum_x86_att.h"
sed "${SED_I[@]}" "s/#include \"\.\.\/\.\.\/\.\.\/common\.h\"/#include \"$s2n_header\"/" "$file"

func_name=$(grep -o '\.global MLD_ASM_NAMESPACE(\([^)]*\))' "$file" | sed 's/\.global MLD_ASM_NAMESPACE(\([^)]*\))/\1/')
if [ -n "$func_name" ]; then
sed "${SED_I[@]}" "s/\.global MLD_ASM_NAMESPACE($func_name)/ S2N_BN_SYM_VISIBILITY_DIRECTIVE(mldsa_$func_name)\n S2N_BN_SYM_PRIVACY_DIRECTIVE(mldsa_$func_name)/" "$file"
sed "${SED_I[@]}" "s/MLD_ASM_FN_SYMBOL($func_name)/S2N_BN_SYMBOL(mldsa_$func_name):/" "$file"
sed "${SED_I[@]}" "s/MLD_ASM_FN_SIZE($func_name)/S2N_BN_SIZE_DIRECTIVE(mldsa_$func_name)/" "$file"
fi
done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid this complexity if we can. I'll do some experiments on the x-native side to see if we can get rid of it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more difficult than I thought. Let's stick with this for now.

Comment thread crypto/fipsmodule/ml_dsa/importer.sh Outdated
# Only consts.c (shared with the assembly backend) needs to be compiled.
echo "Strip C-intrinsic includes from mldsa_native_bcm.c"
BCM=$SRC/mldsa_native_bcm.c
sed "${SED_I[@]}" '/^#include "native\/x86_64\/src\/poly_caddq_avx2\.c"/d' "$BCM"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file does not exist anymore upstream I think.

Copy link
Copy Markdown
Contributor Author

@jakemas jakemas May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, it's now poly_caddq_avx2_asm.S (assembly with a verified proof). Removed this sed line and imported the assembly file directly in 3f72ba0.

Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the importer and it looks mostly good, except for one sed applying to a file that doesn't exist.

I'm not sure though if this is the right time to import: We're excluding a single *.S file from the backend (caddq) for which we don't yet have the proof upstream, and we're importing a non-tagged commit.

I'd prefer for the PR to wait until caddq is proved upstream as well, and we have a tagged release.

@jakemas
Copy link
Copy Markdown
Contributor Author

jakemas commented May 21, 2026

I have reviewed the importer and it looks mostly good, except for one sed applying to a file that doesn't exist.

I'm not sure though if this is the right time to import: We're excluding a single *.S file from the backend (caddq) for which we don't yet have the proof upstream, and we're importing a non-tagged commit.

I'd prefer for the PR to wait until caddq is proved upstream as well, and we have a tagged release.

I agree, unfortunately, we are actively in the process of converting the AVX2 from c to S, there is another function nearing completing in PR now. I opened pq-code-package/mldsa-native#1068 to try to make the import logic even cleaner, if we can merge that upstream, version mldsa-native, not merge any more AVX2 conversions while we import, then yes, I am good with waiting to include cadddq.

Ubuntu added 2 commits May 22, 2026 17:39
Upstream mldsa-native now provides poly_caddq as verified assembly
(poly_caddq_avx2_asm.S) rather than C intrinsics. Import it alongside
the other proven assembly operations, add the MLD_USE_NATIVE_POLY_CADDQ
declaration to our custom meta header, and drop the sed that stripped
the old C-intrinsic include from the BCM.
Replace individual cp lines for each .S file with a single glob, and
collapse the per-file sed deletions into one pattern that removes all
x86_64 C-intrinsic .c includes except consts.c.
cp $TMP/mldsa/src/native/x86_64/src/arith_native_x86_64.h $SRC/native/x86_64/src
cp $TMP/mldsa/src/native/x86_64/src/consts.h $SRC/native/x86_64/src
cp $TMP/mldsa/src/native/x86_64/src/consts.c $SRC/native/x86_64/src
# NOTE: all imported .S files must have verified proofs in s2n-bignum.
Copy link
Copy Markdown
Contributor Author

@jakemas jakemas May 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If new .S functions are converted upstream without a hol-light proof at the same time, this import script will pick them up. This could be awkward if we run imports before the proofs land for new functions. We could add a manual check, but its clunky. anything we add will be temporary, as over time we will add all .S functions, as all will have proofs.

Upstream I'm writing the proof at the same time as the conversion, so this won't be an issue for upcoming uniform_rej for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants