ML-DSA: import and enable x86_64 assembly backend from mldsa-native#3195
ML-DSA: import and enable x86_64 assembly backend from mldsa-native#3195jakemas wants to merge 9 commits into
Conversation
78c4678 to
fd05b29
Compare
|
Note on mapfile -O ${#SOURCE_FILES[@]} -t SOURCE_FILES < <(find crypto/fipsmodule/ml_dsa/mldsa/native/x86_64/src -name "*.S" -type f | sort -f)This is the same pattern as when ML-KEM's x86_64 backend was first added. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3195 +/- ##
=======================================
Coverage 78.15% 78.16%
=======================================
Files 689 692 +3
Lines 123678 123737 +59
Branches 17192 17196 +4
=======================================
+ Hits 96663 96717 +54
- Misses 26097 26100 +3
- Partials 918 920 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
WillChilds-Klein
left a comment
There was a problem hiding this comment.
Is the second commit solely the result of running the import script? i.e. does it contain any manual changes?
a34e245 to
a3215fa
Compare
|
Good question. The previous structure had manual edits mixed into the import commit. I've restructured into two clean commits:
The importer now handles stripping C-intrinsic content from |
a3215fa to
8ab7c5b
Compare
e496204 to
a47dd87
Compare
| if [[ "$(uname)" == "Darwin" ]]; then | ||
| SED_I=(-i "") | ||
| else | ||
| SED_I=(-i) | ||
| fi | ||
|
|
There was a problem hiding this comment.
Unnecessary code movement
There was a problem hiding this comment.
Fixed — moved SED_I back to its original position (after the unifdef/BCM copy).
a47dd87 to
f8ef9a4
Compare
Add CMake support to compile mldsa-native x86_64 assembly files, a custom mldsa_x86_64_meta.h declaring only the assembly-backed native operations (NTT, INTT, nttunpack, pointwise, polyvecl_pointwise_acc), and the importer script to pull them from upstream.
Output of: GITHUB_SHA=1b47ba602b3220fb06380840fd516dde4243122e ./importer.sh --force No manual changes — reproducible by running the above command.
|
@hanno-becker anything outstanding here? |
| mkdir -p $SRC/native/x86_64/src | ||
| # Backend API and specification assumed by mldsa-native frontend | ||
| cp $TMP/mldsa/src/native/api.h $SRC/native | ||
| # Backend header -- unused C-intrinsic declarations are harmless and left intact |
There was a problem hiding this comment.
I've seen old compilers (specifically GCC 4.8) complain about unresolved external function references even in static inline functions that were never used. I believe this could still happen here if we don't remove the inline definitions for the intrinsics backend. But your CI does not seem to flag it, so up to you if this risk is tolerable.
| echo "Fixup x86_64 assembly backend to use s2n-bignum macros" | ||
| for file in $SRC/native/x86_64/src/*.S; do | ||
| echo "Processing $file" | ||
| tmp_file=$(mktemp) | ||
|
|
||
| backend_define="MLD_ARITH_BACKEND_X86_64_DEFAULT" | ||
|
|
||
| # Flatten multiline preprocessor directives, then process with unifdef | ||
| sed -e ':a' -e 'N' -e '$!ba' -e 's/\\\n/ /g' "$file" | \ | ||
| unifdef -D$backend_define -UMLD_CONFIG_MULTILEVEL_NO_SHARED -DMLD_CONFIG_MULTILEVEL_WITH_SHARED > "$tmp_file" | ||
| mv "$tmp_file" "$file" | ||
|
|
||
| # Replace common.h include and assembly macros | ||
| s2n_header="_internal_s2n_bignum_x86_att.h" | ||
| sed "${SED_I[@]}" "s/#include \"\.\.\/\.\.\/\.\.\/common\.h\"/#include \"$s2n_header\"/" "$file" | ||
|
|
||
| func_name=$(grep -o '\.global MLD_ASM_NAMESPACE(\([^)]*\))' "$file" | sed 's/\.global MLD_ASM_NAMESPACE(\([^)]*\))/\1/') | ||
| if [ -n "$func_name" ]; then | ||
| sed "${SED_I[@]}" "s/\.global MLD_ASM_NAMESPACE($func_name)/ S2N_BN_SYM_VISIBILITY_DIRECTIVE(mldsa_$func_name)\n S2N_BN_SYM_PRIVACY_DIRECTIVE(mldsa_$func_name)/" "$file" | ||
| sed "${SED_I[@]}" "s/MLD_ASM_FN_SYMBOL($func_name)/S2N_BN_SYMBOL(mldsa_$func_name):/" "$file" | ||
| sed "${SED_I[@]}" "s/MLD_ASM_FN_SIZE($func_name)/S2N_BN_SIZE_DIRECTIVE(mldsa_$func_name)/" "$file" | ||
| fi | ||
| done |
There was a problem hiding this comment.
We should avoid this complexity if we can. I'll do some experiments on the x-native side to see if we can get rid of it.
There was a problem hiding this comment.
It's more difficult than I thought. Let's stick with this for now.
| # Only consts.c (shared with the assembly backend) needs to be compiled. | ||
| echo "Strip C-intrinsic includes from mldsa_native_bcm.c" | ||
| BCM=$SRC/mldsa_native_bcm.c | ||
| sed "${SED_I[@]}" '/^#include "native\/x86_64\/src\/poly_caddq_avx2\.c"/d' "$BCM" |
There was a problem hiding this comment.
This file does not exist anymore upstream I think.
There was a problem hiding this comment.
Correct, it's now poly_caddq_avx2_asm.S (assembly with a verified proof). Removed this sed line and imported the assembly file directly in 3f72ba0.
hanno-becker
left a comment
There was a problem hiding this comment.
I have reviewed the importer and it looks mostly good, except for one sed applying to a file that doesn't exist.
I'm not sure though if this is the right time to import: We're excluding a single *.S file from the backend (caddq) for which we don't yet have the proof upstream, and we're importing a non-tagged commit.
I'd prefer for the PR to wait until caddq is proved upstream as well, and we have a tagged release.
I agree, unfortunately, we are actively in the process of converting the AVX2 from c to S, there is another function nearing completing in PR now. I opened pq-code-package/mldsa-native#1068 to try to make the import logic even cleaner, if we can merge that upstream, version mldsa-native, not merge any more AVX2 conversions while we import, then yes, I am good with waiting to include cadddq. |
Upstream mldsa-native now provides poly_caddq as verified assembly (poly_caddq_avx2_asm.S) rather than C intrinsics. Import it alongside the other proven assembly operations, add the MLD_USE_NATIVE_POLY_CADDQ declaration to our custom meta header, and drop the sed that stripped the old C-intrinsic include from the BCM.
Replace individual cp lines for each .S file with a single glob, and collapse the per-file sed deletions into one pattern that removes all x86_64 C-intrinsic .c includes except consts.c.
| cp $TMP/mldsa/src/native/x86_64/src/arith_native_x86_64.h $SRC/native/x86_64/src | ||
| cp $TMP/mldsa/src/native/x86_64/src/consts.h $SRC/native/x86_64/src | ||
| cp $TMP/mldsa/src/native/x86_64/src/consts.c $SRC/native/x86_64/src | ||
| # NOTE: all imported .S files must have verified proofs in s2n-bignum. |
There was a problem hiding this comment.
If new .S functions are converted upstream without a hol-light proof at the same time, this import script will pick them up. This could be awkward if we run imports before the proofs land for new functions. We could add a manual check, but its clunky. anything we add will be temporary, as over time we will add all .S functions, as all will have proofs.
Upstream I'm writing the proof at the same time as the conversion, so this won't be an issue for upcoming uniform_rej for example.
Summary
.S) files with completed HOL-Light formal verification proofs are imported; C intrinsics and unverified assembly are excludedBenchmark
Measured on Intel Xeon Platinum 8175M @ 2.50GHz (EC2 c5.2xlarge), average of 4 runs (ops/sec, higher is better). "Before" is
bssl speed -filter MLDSAbuilt frommain; "After" is the same with this PR applied:Changes
Commit 1 — Manual/preparatory changes:
CMakeLists.txt— adds ML-DSA x86_64 assembly sources to BCM build viafile(GLOB ... CONFIGURE_DEPENDS "*.S")mldsa_native_config.h— enables native backend withMLD_CONFIG_USE_NATIVE_BACKEND_ARITHmldsa_native_backend.h— platform dispatcher (x86_64 + AVX2 only for now)mldsa_x86_64_meta.h— hand-maintained backend header outside the importedmldsa/tree, declaring only the assembly-backed operations we useimporter.sh— imports x86_64 backend and processes assembly with s2n-bignum macrosCommit 2 — Clean import output:
GITHUB_SHA=1b47ba602b3220fb06380840fd516dde4243122e ./importer.sh --forceFunctions accelerated
All imported functions have completed HOL-Light formal verification proofs:
mldsa_ntt.mlmldsa_intt.mlmldsa_nttunpack.mlmldsa_pointwise.mlmldsa_pointwise_acc_l{4,5,7}.mlmldsa_caddq.mlImporter simplifications (addressing review feedback)
The importer is now significantly simpler compared to the previous revision:
MLD_INTERNAL_APIsed — upstream now usesMLD_INTERNAL_DATA_DECLARATION/DEFINITION_avx2_asm.Snames (no collision with mlkem).Sfiles — no need to enumerate each assembly file individuallyconsts.c)MLD_ASM_FN_SIZE→S2N_BN_SIZE_DIRECTIVE(upstream now defines it)Call-outs
CRYPTO_is_AVX2_capable()— falls back to C on non-AVX2 systemsTesting
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.