Skip to content

Lowram: Share buffers with non-overlapping lifetimes in verify#1007

Merged
mkannwischer merged 1 commit into
mainfrom
verify-buffer-sharing
May 4, 2026
Merged

Lowram: Share buffers with non-overlapping lifetimes in verify#1007
mkannwischer merged 1 commit into
mainfrom
verify-buffer-sharing

Conversation

@mkannwischer
Copy link
Copy Markdown
Contributor

@mkannwischer mkannwischer commented Apr 1, 2026

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 1, 2026

CBMC Results (ML-DSA-44)

⚠️ Attention Required

Proof Status Current Previous Change
sign_signature_internal ⚠️ 21s 14s +50%
Full Results (200 proofs)
Proof Status Current Previous Change
**TOTAL** 1655s 1843s -10.2%
sign_verify_internal 132s 211s -37%
rej_uniform_native 120s 130s -8%
polyvecl_pointwise_acc_montgomery_c 117s 138s -15%
mld_invntt_layer 89s 100s -11%
poly_pointwise_montgomery_c 89s 107s -17%
mld_ct_memcmp 70s 85s -18%
mld_attempt_signature_generation 53s 48s +10%
mld_ntt_layer 42s 45s -7%
fqmul 27s 31s -13%
polyvec_matrix_expand 26s 27s -4%
polyvec_matrix_pointwise_montgomery 24s 25s -4%
keccakf1600x4_permute_native 22s 22s +0%
sign_keypair_internal 22s 22s +0%
sign_signature_internal ⚠️ 21s 14s +50%
rej_uniform_c 19s 18s +6%
sign_pk_from_sk 18s 19s -5%
polyveck_chknorm 16s 20s -20%
rej_uniform 16s 17s -6%
mld_ntt_butterfly_block 15s 16s -6%
polyt0_unpack 15s 14s +7%
poly_chknorm_c 14s 17s -18%
poly_uniform_eta_4x 14s 15s -7%
polyz_unpack_c 13s 13s +0%
poly_add 12s 11s +9%
polyeta_unpack 12s 12s +0%
polyvec_matrix_pointwise_montgomery_yvec 12s 12s +0%
mld_check_pct 11s 9s +22%
poly_uniform_4x 11s 12s -8%
poly_power2round 9s 9s +0%
polyveck_use_hint 9s 8s +12%
keccak_absorb_once_x4 8s 8s +0%
keccakf1600_permute_native 8s 6s +33%
mld_compute_pack_z 8s 6s +33%
poly_invntt_tomont_c 8s 9s -11%
polyveck_decompose 8s 6s +33%
sign_open 8s 6s +33%
keccak_absorb 7s 6s +17%
keccakf1600_permute 7s 7s +0%
mld_ct_cmask_nonzero_u8 7s 3s +133%
polyveck_add 7s 5s +40%
polyveck_caddq 7s 3s +133%
mld_sample_s1_s2 6s 5s +20%
pointwise_acc_native_aarch64 6s 4s +50%
poly_ntt_native 6s 4s +50%
polyveck_pointwise_poly_montgomery 6s 7s -14%
polyvecl_chknorm 6s 5s +20%
sign 6s 6s +0%
keccak_squeezeblocks_x4 5s 3s +67%
mld_prepare_domain_separation_prefix 5s 2s +150%
nttunpack_native_x86_64 5s 5s +0%
pointwise_acc_native_x86_64 5s 5s +0%
poly_invntt_tomont 5s 4s +25%
poly_ntt 5s 3s +67%
poly_uniform_eta 5s 2s +150%
poly_use_hint_c 5s 4s +25%
polyeta_pack 5s 4s +25%
polyt0_pack 5s 3s +67%
polyvec_matrix_expand_serial 5s 5s +0%
polyveck_ntt 5s 4s +25%
polyveck_power2round 5s 4s +25%
polyveck_sub 5s 5s +0%
polyveck_unpack_t0 5s 4s +25%
polyvecl_ntt 5s 5s +0%
polyvecl_pack_eta 5s 7s -29%
polyvecl_unpack_z 5s 4s +25%
rej_eta_native 5s 2s +150%
sign_verify_pre_hash_shake256 5s 3s +67%
keccak_f1600_x1_native_aarch64_v84a 4s 3s +33%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 4s 2s +100%
keccak_f1600_x4_native_avx2 4s 3s +33%
keccak_squeeze 4s 4s +0%
keccakf1600_extract_bytes (big endian) 4s 2s +100%
mld_h 4s 5s -20%
mld_value_barrier_u8 4s 3s +33%
pack_pk 4s 4s +0%
pack_sig_h_poly 4s 2s +100%
pack_sk_rho_key_tr_s2_t0 4s 3s +33%
pointwise_native_x86_64 4s 3s +33%
poly_challenge 4s 3s +33%
poly_chknorm_native_aarch64 4s 3s +33%
poly_decompose 4s 2s +100%
poly_decompose_c 4s 4s +0%
poly_ntt_c 4s 3s +33%
poly_permute_bitrev_to_custom_optional 4s 3s +33%
poly_use_hint_native 4s 3s +33%
polyt1_unpack 4s 2s +100%
polyveck_invntt_tomont 4s 6s -33%
polyveck_pack_w1 4s 4s +0%
polyz_pack 4s 3s +33%
power2round 4s 7s -43%
shake128_finalize 4s 4s +0%
sig_unpack_hints 4s - new
sign_keypair 4s 3s +33%
sign_signature_extmu 4s 2s +100%
sign_signature_pre_hash_internal 4s 5s -20%
sign_signature_pre_hash_shake256 4s 6s -33%
sign_verify 4s 6s -33%
sk_s1hat_get_poly 4s 2s +100%
unpack_sk_t0hat 4s 5s -20%
caddq 3s 3s +0%
fqscale 3s 5s -40%
intt_native_x86_64 3s 4s -25%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 2s +50%
keccak_finalize 3s 3s +0%
keccakf1600_xor_bytes (big endian) 3s 3s +0%
mld_ct_abs_i32 3s 3s +0%
mld_ct_cmask_neg_i32 3s 2s +50%
mld_ct_get_optblocker_u32 3s 4s -25%
mld_sample_s1_s2_serial 3s 3s +0%
mld_value_barrier_u32 3s 2s +50%
montgomery_reduce 3s 6s -50%
pack_sig_c 3s 3s +0%
pack_sig_z 3s 4s -25%
poly_caddq_c 3s 4s -25%
poly_caddq_native 3s 4s -25%
poly_chknorm 3s 4s -25%
poly_decompose_native 3s 1s +200%
poly_make_hint 3s 2s +50%
poly_permute_bitrev_to_custom_optional_native 3s 2s +50%
poly_shiftl 3s 5s -40%
poly_uniform_gamma1_4x 3s 4s -25%
poly_use_hint 3s 2s +50%
polyveck_reduce 3s 5s -40%
polyveck_shiftl 3s 3s +0%
polyvecl_pointwise_acc_montgomery 3s 2s +50%
polyvecl_pointwise_acc_montgomery_native 3s 3s +0%
polyvecl_uniform_gamma1 3s 3s +0%
polyz_unpack 3s 2s +50%
polyz_unpack_19_native_aarch64 3s 3s +0%
reduce32 3s 2s +50%
rej_eta 3s 2s +50%
rej_eta_c 3s 3s +0%
shake128_squeeze 3s 1s +200%
shake256_init 3s 4s -25%
shake256x4_absorb_once 3s 4s -25%
sign_signature 3s 6s -50%
sign_verify_extmu 3s 4s -25%
sign_verify_pre_hash_internal 3s 6s -50%
sk_t0hat_get_poly 3s 4s -25%
sys_check_capability 3s 6s -50%
unpack_pk_t1 3s - new
unpack_sk 3s 4s -25%
unpack_sk_s2hat 3s 3s +0%
use_hint 3s 3s +0%
yvec_get_poly 3s 3s +0%
keccak_f1600_x1_native_aarch64 2s 2s +0%
keccak_init 2s 1s +100%
keccakf1600_xor_bytes 2s 2s +0%
keccakf1600x4_extract_bytes 2s 3s -33%
keccakf1600x4_permute 2s 3s -33%
mld_ct_cmask_nonzero_u32 2s 3s -33%
mld_ct_get_optblocker_u8 2s 1s +100%
mld_keccakf1600_extract_bytes 2s 3s -33%
mld_polymat_expand_entry 2s 4s -50%
mld_value_barrier_i64 2s 4s -50%
ntt_native_aarch64 2s 4s -50%
ntt_native_x86_64 2s 3s -33%
pack_sk_s1 2s 4s -50%
pointwise_native_aarch64 2s 2s +0%
poly_caddq 2s 3s -33%
poly_caddq_native_aarch64 2s 2s +0%
poly_chknorm_native 2s 3s -33%
poly_invntt_tomont_native 2s 2s +0%
poly_pointwise_montgomery 2s 2s +0%
poly_pointwise_montgomery_native 2s 3s -33%
poly_reduce 2s 3s -33%
poly_uniform 2s 4s -50%
poly_uniform_gamma1 2s 4s -50%
poly_use_hint_native_aarch64 2s 2s +0%
polyt1_pack 2s 9s -78%
polyvec_matrix_pointwise_montgomery_row 2s 3s -33%
polyveck_pack_eta 2s 4s -50%
polyveck_pack_t0 2s 5s -60%
polyveck_unpack_eta 2s 5s -60%
polyvecl_uniform_gamma1_serial 2s 2s +0%
polyvecl_unpack_eta 2s 3s -33%
polyw1_pack 2s 3s -33%
polyz_unpack_17_native_aarch64 2s 4s -50%
polyz_unpack_native 2s 2s +0%
shake128_absorb 2s 3s -33%
shake128_init 2s 3s -33%
shake128_release 2s 2s +0%
shake128x4_absorb_once 2s 3s -33%
shake128x4_squeezeblocks 2s 1s +100%
shake256 2s 2s +0%
shake256_release 2s 2s +0%
shake256_squeeze 2s 2s +0%
shake256x4_squeezeblocks 2s 3s -33%
sk_s2hat_get_poly 2s 3s -33%
unpack_sk_s1hat 2s 3s -33%
decompose 1s 2s -50%
keccak_f1600_x4_native_aarch64_v84a 1s 3s -67%
keccakf1600x4_xor_bytes 1s 6s -83%
make_hint 1s 3s -67%
mld_ct_get_optblocker_i64 1s 2s -50%
mld_ct_sel_int32 1s 4s -75%
poly_sub 1s 2s -50%
shake256_absorb 1s 3s -67%
shake256_finalize 1s 3s -67%
yvec_init 1s 5s -80%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 1, 2026

CBMC Results (ML-DSA-65)

⚠️ Attention Required

Proof Status Current Previous Change
sign_verify_internal ⚠️ 272s 125s +118%
Full Results (200 proofs)
Proof Status Current Previous Change
**TOTAL** 2617s 2488s +5.2%
polyvecl_pointwise_acc_montgomery_c 632s 622s +2%
sign_verify_internal ⚠️ 272s 125s +118%
polyvec_matrix_expand 135s 137s -1%
rej_uniform_native 134s 136s -1%
poly_pointwise_montgomery_c 113s 109s +4%
mld_invntt_layer 101s 104s -3%
mld_ct_memcmp 89s 86s +3%
mld_ntt_layer 49s 48s +2%
mld_attempt_signature_generation 42s 54s -22%
fqmul 31s 33s -6%
sign_keypair_internal 31s 31s +0%
polyvec_matrix_expand_serial 28s 30s -7%
sign_signature_internal 27s 33s -18%
keccakf1600x4_permute_native 25s 22s +14%
polyvec_matrix_pointwise_montgomery 22s 23s -4%
polyvec_matrix_pointwise_montgomery_yvec 22s 21s +5%
sign_pk_from_sk 22s 20s +10%
rej_uniform_c 19s 19s +0%
mld_ntt_butterfly_block 17s 18s -6%
polyt0_unpack 17s 16s +6%
polyveck_power2round 17s 18s -6%
rej_uniform 17s 20s -15%
poly_chknorm_c 16s 17s -6%
polyveck_decompose 15s 17s -12%
keccak_absorb_once_x4 12s 11s +9%
poly_uniform_eta_4x 12s 14s -14%
poly_add 11s 12s -8%
poly_power2round 11s 10s +10%
poly_uniform_4x 11s 13s -15%
polyveck_pointwise_poly_montgomery 11s 8s +38%
mld_check_pct 10s 10s +0%
polyveck_use_hint 10s 10s +0%
poly_invntt_tomont_c 9s 10s -10%
polyveck_add 9s 12s -25%
polyveck_shiftl 9s 9s +0%
polyz_unpack_c 8s 7s +14%
keccakf1600_permute_native 7s 6s +17%
mld_compute_pack_z 7s 9s -22%
pointwise_acc_native_aarch64 7s 7s +0%
pointwise_acc_native_x86_64 7s 7s +0%
poly_caddq_c 7s 4s +75%
polyveck_invntt_tomont 7s 8s -12%
polyveck_ntt 7s 6s +17%
polyveck_reduce 7s 7s +0%
polyveck_sub 7s 9s -22%
keccak_absorb 6s 9s -33%
keccakf1600_permute 6s 8s -25%
ntt_native_aarch64 6s 5s +20%
poly_ntt_c 6s 3s +100%
poly_uniform 6s 4s +50%
polyveck_chknorm 6s 5s +20%
polyvecl_chknorm 6s 9s -33%
polyvecl_ntt 6s 9s -33%
sign 6s 9s -33%
sign_open 6s 7s -14%
sign_signature_pre_hash_shake256 6s 3s +100%
sign_verify 6s 3s +100%
sys_check_capability 6s 3s +100%
keccakf1600x4_extract_bytes 5s 2s +150%
mld_sample_s1_s2 5s 4s +25%
mld_sample_s1_s2_serial 5s 7s -29%
pack_pk 5s 2s +150%
pointwise_native_aarch64 5s 4s +25%
poly_decompose 5s 6s -17%
poly_pointwise_montgomery_native 5s 4s +25%
poly_shiftl 5s 3s +67%
poly_uniform_gamma1 5s 3s +67%
polyt1_unpack 5s 5s +0%
polyveck_caddq 5s 7s -29%
polyveck_unpack_t0 5s 4s +25%
polyvecl_unpack_z 5s 3s +67%
polyz_pack 5s 3s +67%
polyz_unpack_17_native_aarch64 5s 3s +67%
polyz_unpack_19_native_aarch64 5s 2s +150%
polyz_unpack_native 5s 1s +400%
reduce32 5s 4s +25%
shake128x4_absorb_once 5s 4s +25%
sig_unpack_hints 5s - new
sign_signature 5s 3s +67%
unpack_sk 5s 3s +67%
caddq 4s 3s +33%
keccak_squeezeblocks_x4 4s 6s -33%
mld_ct_cmask_nonzero_u32 4s 3s +33%
mld_ct_get_optblocker_u8 4s 2s +100%
mld_prepare_domain_separation_prefix 4s 6s -33%
pack_sig_c 4s 4s +0%
pack_sig_h_poly 4s 3s +33%
pack_sig_z 4s 5s -20%
pack_sk_s1 4s 2s +100%
poly_challenge 4s 3s +33%
poly_decompose_c 4s 4s +0%
poly_invntt_tomont 4s 2s +100%
poly_ntt_native 4s 1s +300%
poly_permute_bitrev_to_custom_optional 4s 3s +33%
poly_sub 4s 3s +33%
poly_uniform_eta 4s 3s +33%
poly_uniform_gamma1_4x 4s 4s +0%
poly_use_hint_c 4s 5s -20%
poly_use_hint_native 4s 2s +100%
poly_use_hint_native_aarch64 4s 3s +33%
polyt0_pack 4s 5s -20%
polyvecl_pack_eta 4s 5s -20%
polyvecl_uniform_gamma1 4s 4s +0%
polyvecl_unpack_eta 4s 2s +100%
polyw1_pack 4s 3s +33%
shake128_finalize 4s 3s +33%
shake256x4_absorb_once 4s 2s +100%
sign_keypair 4s 3s +33%
sign_verify_extmu 4s 6s -33%
sign_verify_pre_hash_internal 4s 7s -43%
sign_verify_pre_hash_shake256 4s 5s -20%
sk_s2hat_get_poly 4s 3s +33%
unpack_sk_s2hat 4s 3s +33%
yvec_get_poly 4s 3s +33%
yvec_init 4s 2s +100%
fqscale 3s 2s +50%
keccak_f1600_x1_native_aarch64_v84a 3s 2s +50%
keccak_init 3s 3s +0%
keccakf1600_xor_bytes (big endian) 3s 2s +50%
keccakf1600x4_xor_bytes 3s 3s +0%
make_hint 3s 4s -25%
mld_ct_abs_i32 3s 1s +200%
mld_ct_cmask_neg_i32 3s 2s +50%
mld_ct_get_optblocker_i64 3s 3s +0%
mld_ct_get_optblocker_u32 3s 3s +0%
mld_h 3s 2s +50%
mld_polymat_expand_entry 3s 2s +50%
mld_value_barrier_i64 3s 2s +50%
mld_value_barrier_u32 3s 1s +200%
montgomery_reduce 3s 2s +50%
ntt_native_x86_64 3s 3s +0%
pack_sk_rho_key_tr_s2_t0 3s 2s +50%
poly_chknorm_native 3s 2s +50%
poly_chknorm_native_aarch64 3s 4s -25%
poly_invntt_tomont_native 3s 3s +0%
poly_make_hint 3s 4s -25%
poly_ntt 3s 4s -25%
poly_reduce 3s 3s +0%
polyeta_unpack 3s 3s +0%
polyt1_pack 3s 3s +0%
polyveck_pack_w1 3s 4s -25%
polyvecl_pointwise_acc_montgomery 3s 6s -50%
polyvecl_pointwise_acc_montgomery_native 3s 6s -50%
polyvecl_uniform_gamma1_serial 3s 3s +0%
polyz_unpack 3s 2s +50%
power2round 3s 2s +50%
rej_eta 3s 3s +0%
rej_eta_c 3s 4s -25%
rej_eta_native 3s 3s +0%
shake128_release 3s 2s +50%
shake128x4_squeezeblocks 3s 4s -25%
shake256 3s 3s +0%
shake256_init 3s 1s +200%
shake256_release 3s 2s +50%
shake256_squeeze 3s 3s +0%
shake256x4_squeezeblocks 3s 8s -62%
sign_signature_pre_hash_internal 3s 4s -25%
sk_s1hat_get_poly 3s 4s -25%
unpack_sk_s1hat 3s 3s +0%
unpack_sk_t0hat 3s 2s +50%
use_hint 3s 3s +0%
decompose 2s 6s -67%
intt_native_x86_64 2s 3s -33%
keccak_f1600_x1_native_aarch64 2s 3s -33%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 1s +100%
keccak_f1600_x4_native_avx2 2s 2s +0%
keccak_finalize 2s 2s +0%
keccak_squeeze 2s 2s +0%
keccakf1600x4_permute 2s 1s +100%
mld_ct_cmask_nonzero_u8 2s 2s +0%
mld_ct_sel_int32 2s 2s +0%
mld_keccakf1600_extract_bytes 2s 4s -50%
mld_value_barrier_u8 2s 1s +100%
nttunpack_native_x86_64 2s 4s -50%
pointwise_native_x86_64 2s 2s +0%
poly_caddq 2s 3s -33%
poly_caddq_native 2s 3s -33%
poly_caddq_native_aarch64 2s 2s +0%
poly_chknorm 2s 2s +0%
poly_decompose_native 2s 4s -50%
poly_permute_bitrev_to_custom_optional_native 2s 5s -60%
poly_pointwise_montgomery 2s 4s -50%
poly_use_hint 2s 4s -50%
polyeta_pack 2s 4s -50%
polyveck_pack_eta 2s 5s -60%
polyveck_pack_t0 2s 4s -50%
polyveck_unpack_eta 2s 4s -50%
shake128_absorb 2s 2s +0%
shake128_squeeze 2s 2s +0%
shake256_absorb 2s 3s -33%
sign_signature_extmu 2s 3s -33%
unpack_pk_t1 2s - new
keccakf1600_extract_bytes (big endian) 1s 2s -50%
keccakf1600_xor_bytes 1s 3s -67%
polyvec_matrix_pointwise_montgomery_row 1s 5s -80%
shake128_init 1s 1s +0%
shake256_finalize 1s 2s -50%
sk_t0hat_get_poly 1s 3s -67%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 1, 2026

CBMC Results (ML-DSA-87)

⚠️ Attention Required

Proof Status Current Previous Change
mld_attempt_signature_generation ⚠️ 118s 62s +90%
sign_verify_internal ⚠️ 388s 253s +53%
Full Results (200 proofs)
Proof Status Current Previous Change
**TOTAL** 2590s 2375s +9.1%
sign_verify_internal ⚠️ 388s 253s +53%
polyvecl_pointwise_acc_montgomery_c 345s 345s +0%
polyvec_matrix_expand 172s 169s +2%
rej_uniform_native 127s 126s +1%
mld_attempt_signature_generation ⚠️ 118s 62s +90%
poly_pointwise_montgomery_c 101s 92s +10%
mld_invntt_layer 97s 90s +8%
mld_ct_memcmp 76s 73s +4%
polyvec_matrix_expand_serial 56s 56s +0%
sign_keypair_internal 50s 52s -4%
mld_ntt_layer 44s 45s -2%
sign_signature_internal 38s 36s +6%
sign_pk_from_sk 35s 34s +3%
polyveck_power2round 32s 34s -6%
polyvec_matrix_pointwise_montgomery 30s 29s +3%
fqmul 29s 27s +7%
keccakf1600x4_permute_native 24s 23s +4%
mld_ntt_butterfly_block 19s 16s +19%
polyvec_matrix_pointwise_montgomery_yvec 18s 19s -5%
poly_chknorm_c 17s 16s +6%
rej_uniform_c 17s 17s +0%
polyt0_unpack 16s 17s -6%
rej_uniform 16s 19s -16%
poly_uniform_eta_4x 15s 13s +15%
polyveck_ntt 12s 10s +20%
polyveck_pointwise_poly_montgomery 12s 9s +33%
poly_uniform_4x 11s 12s -8%
polyveck_decompose 11s 12s -8%
polyveck_use_hint 11s 11s +0%
keccak_absorb_once_x4 10s 9s +11%
poly_add 10s 12s -17%
polyeta_unpack 10s 12s -17%
polyveck_add 10s 11s -9%
polyveck_shiftl 10s 8s +25%
sign 10s 7s +43%
poly_power2round 9s 9s +0%
polyveck_sub 9s 8s +12%
sig_unpack_hints 8s - new
keccakf1600_permute_native 7s 7s +0%
mld_check_pct 7s 9s -22%
mld_compute_pack_z 7s 8s -12%
mld_sample_s1_s2 7s 6s +17%
pointwise_acc_native_x86_64 7s 8s -12%
poly_caddq_c 7s 4s +75%
poly_decompose_c 7s 5s +40%
poly_invntt_tomont_c 7s 8s -12%
polyveck_reduce 7s 7s +0%
keccak_absorb 6s 7s -14%
keccakf1600_permute 6s 7s -14%
keccakf1600_xor_bytes 6s 1s +500%
pointwise_native_aarch64 6s 2s +200%
polyz_unpack_c 6s 7s -14%
rej_eta_c 6s 4s +50%
fqscale 5s 5s +0%
nttunpack_native_x86_64 5s 3s +67%
pointwise_acc_native_aarch64 5s 5s +0%
poly_ntt 5s 3s +67%
poly_ntt_c 5s 2s +150%
poly_pointwise_montgomery_native 5s 3s +67%
poly_use_hint_native_aarch64 5s 2s +150%
polyt0_pack 5s 1s +400%
polyveck_caddq 5s 4s +25%
polyveck_chknorm 5s 7s -29%
polyveck_invntt_tomont 5s 5s +0%
polyveck_pack_w1 5s 4s +25%
polyveck_unpack_t0 5s 4s +25%
polyvecl_chknorm 5s 4s +25%
polyvecl_ntt 5s 5s +0%
shake256_absorb 5s 2s +150%
sign_keypair 5s 4s +25%
sign_open 5s 4s +25%
sign_signature_extmu 5s 6s -17%
sign_signature_pre_hash_internal 5s 4s +25%
sign_signature_pre_hash_shake256 5s 8s -38%
sign_verify_extmu 5s 4s +25%
sign_verify_pre_hash_internal 5s 5s +0%
yvec_init 5s 4s +25%
keccak_f1600_x1_native_aarch64_v84a 4s 3s +33%
keccak_squeezeblocks_x4 4s 4s +0%
mld_h 4s 4s +0%
mld_keccakf1600_extract_bytes 4s 1s +300%
mld_sample_s1_s2_serial 4s 8s -50%
pack_pk 4s 6s -33%
poly_caddq 4s 2s +100%
poly_challenge 4s 3s +33%
poly_chknorm 4s 2s +100%
poly_decompose 4s 5s -20%
poly_permute_bitrev_to_custom_optional 4s 2s +100%
poly_shiftl 4s 3s +33%
poly_uniform_gamma1 4s 6s -33%
poly_uniform_gamma1_4x 4s 5s -20%
poly_use_hint_native 4s 4s +0%
polyeta_pack 4s 3s +33%
polyt1_unpack 4s 2s +100%
polyvec_matrix_pointwise_montgomery_row 4s 1s +300%
polyvecl_pack_eta 4s 2s +100%
polyvecl_pointwise_acc_montgomery_native 4s 3s +33%
polyvecl_uniform_gamma1 4s 3s +33%
polyvecl_unpack_eta 4s 3s +33%
polyz_pack 4s 4s +0%
polyz_unpack_19_native_aarch64 4s 3s +33%
rej_eta_native 4s 6s -33%
shake128x4_absorb_once 4s 3s +33%
shake256 4s 4s +0%
sign_verify_pre_hash_shake256 4s 2s +100%
sk_t0hat_get_poly 4s 4s +0%
unpack_sk_s2hat 4s 3s +33%
unpack_sk_t0hat 4s 4s +0%
use_hint 4s 4s +0%
caddq 3s 6s -50%
decompose 3s 3s +0%
intt_native_x86_64 3s 2s +50%
keccak_f1600_x4_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 3s +0%
keccak_finalize 3s 2s +50%
keccak_squeeze 3s 3s +0%
keccakf1600_xor_bytes (big endian) 3s 2s +50%
keccakf1600x4_extract_bytes 3s 2s +50%
keccakf1600x4_permute 3s 2s +50%
make_hint 3s 4s -25%
mld_ct_cmask_neg_i32 3s 1s +200%
mld_ct_get_optblocker_i64 3s 5s -40%
mld_ct_sel_int32 3s 3s +0%
mld_polymat_expand_entry 3s 5s -40%
pack_sig_h_poly 3s 3s +0%
pack_sk_s1 3s 5s -40%
pointwise_native_x86_64 3s 5s -40%
poly_caddq_native 3s 4s -25%
poly_chknorm_native 3s 2s +50%
poly_invntt_tomont_native 3s 4s -25%
poly_permute_bitrev_to_custom_optional_native 3s 2s +50%
poly_reduce 3s 2s +50%
poly_uniform 3s 4s -25%
poly_uniform_eta 3s 4s -25%
poly_use_hint_c 3s 6s -50%
polyveck_pack_eta 3s 2s +50%
polyveck_unpack_eta 3s 3s +0%
polyvecl_pointwise_acc_montgomery 3s 3s +0%
polyvecl_unpack_z 3s 2s +50%
polyz_unpack 3s 3s +0%
polyz_unpack_17_native_aarch64 3s 4s -25%
polyz_unpack_native 3s 2s +50%
reduce32 3s 4s -25%
rej_eta 3s 3s +0%
shake128_finalize 3s 2s +50%
shake128_init 3s 3s +0%
shake128_release 3s 2s +50%
shake128_squeeze 3s 4s -25%
shake256_finalize 3s 2s +50%
shake256x4_absorb_once 3s 2s +50%
shake256x4_squeezeblocks 3s 3s +0%
sign_signature 3s 2s +50%
sk_s2hat_get_poly 3s 3s +0%
sys_check_capability 3s 5s -40%
unpack_pk_t1 3s - new
unpack_sk 3s 6s -50%
unpack_sk_s1hat 3s 5s -40%
keccak_f1600_x1_native_aarch64 2s 2s +0%
keccak_f1600_x4_native_avx2 2s 3s -33%
keccak_init 2s 3s -33%
keccakf1600_extract_bytes (big endian) 2s 4s -50%
mld_ct_abs_i32 2s 3s -33%
mld_ct_cmask_nonzero_u32 2s 3s -33%
mld_ct_get_optblocker_u32 2s 2s +0%
mld_prepare_domain_separation_prefix 2s 3s -33%
mld_value_barrier_u32 2s 1s +100%
montgomery_reduce 2s 2s +0%
ntt_native_aarch64 2s 2s +0%
ntt_native_x86_64 2s 2s +0%
pack_sig_c 2s 3s -33%
pack_sig_z 2s 4s -50%
pack_sk_rho_key_tr_s2_t0 2s 5s -60%
poly_caddq_native_aarch64 2s 4s -50%
poly_chknorm_native_aarch64 2s 3s -33%
poly_decompose_native 2s 2s +0%
poly_invntt_tomont 2s 4s -50%
poly_make_hint 2s 2s +0%
poly_ntt_native 2s 3s -33%
poly_pointwise_montgomery 2s 3s -33%
poly_sub 2s 2s +0%
poly_use_hint 2s 2s +0%
polyt1_pack 2s 3s -33%
polyvecl_uniform_gamma1_serial 2s 3s -33%
polyw1_pack 2s 2s +0%
shake256_init 2s 3s -33%
shake256_release 2s 4s -50%
shake256_squeeze 2s 1s +100%
sign_verify 2s 5s -60%
sk_s1hat_get_poly 2s 5s -60%
yvec_get_poly 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 3s -67%
keccakf1600x4_xor_bytes 1s 2s -50%
mld_ct_cmask_nonzero_u8 1s 3s -67%
mld_ct_get_optblocker_u8 1s 2s -50%
mld_value_barrier_i64 1s 2s -50%
mld_value_barrier_u8 1s 1s +0%
polyveck_pack_t0 1s 3s -67%
power2round 1s 2s -50%
shake128_absorb 1s 2s -50%
shake128x4_squeezeblocks 1s 2s -50%

@mkannwischer mkannwischer force-pushed the verify-buffer-sharing branch from b09a9aa to 86374eb Compare April 1, 2026 12:08
@mkannwischer mkannwischer marked this pull request as ready for review April 1, 2026 12:29
@mkannwischer mkannwischer requested a review from a team as a code owner April 1, 2026 12:29
@mkannwischer mkannwischer force-pushed the verify-buffer-sharing branch 2 times, most recently from 79b7670 to f4f7b85 Compare April 3, 2026 13:34
@mkannwischer mkannwischer force-pushed the verify-buffer-sharing branch 2 times, most recently from e090aec to eacc983 Compare April 8, 2026 06:58
@mkannwischer mkannwischer force-pushed the verify-buffer-sharing branch from eacc983 to 545201d Compare April 29, 2026 06:44
@mkannwischer mkannwischer marked this pull request as draft April 29, 2026 06:44
@mkannwischer mkannwischer force-pushed the verify-buffer-sharing branch from 545201d to 3fc9d89 Compare May 2, 2026 11:35
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented May 2, 2026

CBMC Results (ML-DSA-44, REDUCE-RAM)

Full Results (200 proofs)
Proof Status Current Previous Change
**TOTAL** 1446s 1562s -7.4%
poly_pointwise_montgomery_c 164s 185s -11%
mld_invntt_layer 100s 107s -7%
rej_uniform_native 100s 112s -11%
mld_ct_memcmp 73s 78s -6%
polyvec_matrix_pointwise_montgomery_yvec 60s 65s -8%
mld_ntt_layer 43s 45s -4%
fqmul 28s 32s -12%
sign_verify_internal 27s 65s -58%
keccakf1600x4_permute_native 22s 23s -4%
rej_uniform_c 19s 21s -10%
mld_attempt_signature_generation 18s 24s -25%
rej_uniform 18s 21s -14%
polyeta_unpack 17s 21s -19%
mld_ntt_butterfly_block 16s 15s +7%
sign_keypair_internal 16s 15s +7%
poly_chknorm_c 15s 13s +15%
polyveck_use_hint 13s 15s -13%
sign_pk_from_sk 13s 16s -19%
mld_check_pct 12s 14s -14%
poly_add 12s 12s +0%
poly_uniform_eta_4x 12s 15s -20%
polyveck_pointwise_poly_montgomery 12s 9s +33%
keccak_absorb_once_x4 10s 9s +11%
polyt0_unpack 10s 12s -17%
poly_power2round 9s 7s +29%
polyveck_chknorm 9s 6s +50%
poly_invntt_tomont_c 8s 10s -20%
polyveck_add 8s 8s +0%
keccakf1600_permute_native 7s 6s +17%
mld_compute_pack_z 7s 6s +17%
pointwise_acc_native_x86_64 7s 8s -12%
poly_caddq_c 7s 8s -12%
poly_decompose_c 7s 7s +0%
poly_ntt_native 7s 2s +250%
polyveck_decompose 7s 7s +0%
keccak_absorb 6s 6s +0%
keccak_squeezeblocks_x4 6s 4s +50%
pack_sk_rho_key_tr_s2_t0 6s 4s +50%
pointwise_acc_native_aarch64 6s 5s +20%
poly_decompose_native 6s 3s +100%
poly_shiftl 6s 5s +20%
polyveck_power2round 6s 4s +50%
polyvecl_chknorm 6s 5s +20%
polyvecl_ntt 6s 7s -14%
polyz_unpack 6s 2s +200%
sig_unpack_hints 6s - new
keccakf1600_permute 5s 7s -29%
mld_ct_cmask_neg_i32 5s 4s +25%
mld_prepare_domain_separation_prefix 5s 5s +0%
mld_sample_s1_s2 5s 4s +25%
mld_sample_s1_s2_serial 5s 6s -17%
poly_challenge 5s 3s +67%
poly_pointwise_montgomery_native 5s 2s +150%
polyvec_matrix_pointwise_montgomery_row 5s 8s -38%
polyveck_reduce 5s 5s +0%
polyveck_shiftl 5s 4s +25%
polyw1_pack 5s 3s +67%
polyz_unpack_17_native_aarch64 5s 4s +25%
sign 5s 5s +0%
sign_signature_internal 5s 3s +67%
sign_verify_pre_hash_shake256 5s 2s +150%
use_hint 5s 2s +150%
caddq 4s 5s -20%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 4s 2s +100%
mld_ct_cmask_nonzero_u32 4s 3s +33%
mld_value_barrier_u32 4s 4s +0%
ntt_native_aarch64 4s 3s +33%
nttunpack_native_x86_64 4s 5s -20%
poly_chknorm_native_aarch64 4s 3s +33%
poly_invntt_tomont_native 4s 4s +0%
poly_permute_bitrev_to_custom_optional_native 4s 4s +0%
poly_reduce 4s 5s -20%
poly_uniform_eta 4s 3s +33%
polyeta_pack 4s 3s +33%
polyveck_invntt_tomont 4s 6s -33%
polyveck_ntt 4s 5s -20%
polyveck_pack_t0 4s 4s +0%
polyveck_sub 4s 4s +0%
polyvecl_pack_eta 4s 2s +100%
polyvecl_pointwise_acc_montgomery 4s 2s +100%
polyvecl_pointwise_acc_montgomery_c 4s 2s +100%
polyvecl_uniform_gamma1_serial 4s 3s +33%
polyz_pack 4s 2s +100%
polyz_unpack_native 4s 2s +100%
power2round 4s 3s +33%
reduce32 4s 3s +33%
rej_eta 4s 3s +33%
rej_eta_c 4s 6s -33%
shake256_finalize 4s 2s +100%
shake256x4_squeezeblocks 4s 2s +100%
sign_keypair 4s 6s -33%
sign_signature_extmu 4s 3s +33%
sign_verify 4s 4s +0%
sign_verify_pre_hash_internal 4s 4s +0%
sys_check_capability 4s 2s +100%
unpack_pk_t1 4s - new
unpack_sk 4s 3s +33%
unpack_sk_s1hat 4s 3s +33%
fqscale 3s 1s +200%
intt_native_x86_64 3s 3s +0%
keccak_finalize 3s 2s +50%
keccak_squeeze 3s 2s +50%
keccakf1600x4_permute 3s 4s -25%
make_hint 3s 5s -40%
mld_ct_get_optblocker_i64 3s 2s +50%
mld_ct_get_optblocker_u32 3s 2s +50%
mld_ct_get_optblocker_u8 3s 3s +0%
mld_polymat_expand_entry 3s 3s +0%
montgomery_reduce 3s 4s -25%
ntt_native_x86_64 3s 3s +0%
pack_pk 3s 2s +50%
pack_sig_c 3s 3s +0%
pack_sig_z 3s 3s +0%
poly_caddq_native_aarch64 3s 3s +0%
poly_chknorm_native 3s 3s +0%
poly_decompose 3s 3s +0%
poly_make_hint 3s 3s +0%
poly_ntt 3s 2s +50%
poly_ntt_c 3s 4s -25%
poly_permute_bitrev_to_custom_optional 3s 3s +0%
poly_pointwise_montgomery 3s 3s +0%
poly_uniform 3s 4s -25%
poly_uniform_gamma1 3s 3s +0%
poly_use_hint_native 3s 4s -25%
polyt0_pack 3s 3s +0%
polyt1_pack 3s 2s +50%
polyt1_unpack 3s 3s +0%
polyvec_matrix_expand 3s 2s +50%
polyvec_matrix_pointwise_montgomery 3s 2s +50%
polyveck_caddq 3s 6s -50%
polyveck_pack_eta 3s 5s -40%
polyveck_pack_w1 3s 3s +0%
polyveck_unpack_eta 3s 2s +50%
polyvecl_unpack_z 3s 2s +50%
polyz_unpack_19_native_aarch64 3s 3s +0%
polyz_unpack_c 3s 2s +50%
shake128_absorb 3s 3s +0%
shake128_finalize 3s 4s -25%
shake128_release 3s 2s +50%
shake128_squeeze 3s 3s +0%
shake128x4_absorb_once 3s 2s +50%
shake128x4_squeezeblocks 3s 2s +50%
shake256 3s 4s -25%
shake256_absorb 3s 5s -40%
shake256x4_absorb_once 3s 3s +0%
sign_signature 3s 5s -40%
sign_signature_pre_hash_shake256 3s 3s +0%
sk_s2hat_get_poly 3s 4s -25%
unpack_sk_s2hat 3s 5s -40%
decompose 2s 4s -50%
keccak_f1600_x1_native_aarch64 2s 1s +100%
keccak_f1600_x1_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccak_init 2s 2s +0%
keccakf1600_extract_bytes (big endian) 2s 3s -33%
keccakf1600_xor_bytes 2s 3s -33%
keccakf1600_xor_bytes (big endian) 2s 3s -33%
keccakf1600x4_extract_bytes 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 2s +0%
mld_ct_sel_int32 2s 2s +0%
mld_h 2s 4s -50%
mld_value_barrier_i64 2s 3s -33%
pack_sig_h_poly 2s 3s -33%
pack_sk_s1 2s 3s -33%
pointwise_native_aarch64 2s 2s +0%
poly_caddq 2s 2s +0%
poly_chknorm 2s 2s +0%
poly_invntt_tomont 2s 1s +100%
poly_sub 2s 6s -67%
poly_uniform_4x 2s 4s -50%
poly_use_hint 2s 3s -33%
poly_use_hint_c 2s 2s +0%
poly_use_hint_native_aarch64 2s 3s -33%
polyvec_matrix_expand_serial 2s 2s +0%
polyveck_unpack_t0 2s 2s +0%
polyvecl_uniform_gamma1 2s 4s -50%
polyvecl_unpack_eta 2s 4s -50%
rej_eta_native 2s 3s -33%
shake128_init 2s 3s -33%
shake256_init 2s 4s -50%
shake256_release 2s 4s -50%
sign_open 2s 5s -60%
sign_signature_pre_hash_internal 2s 5s -60%
sign_verify_extmu 2s 3s -33%
sk_s1hat_get_poly 2s 3s -33%
unpack_sk_t0hat 2s 3s -33%
yvec_get_poly 2s 2s +0%
yvec_init 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 1s 2s -50%
keccak_f1600_x4_native_avx2 1s 2s -50%
keccakf1600x4_xor_bytes 1s 2s -50%
mld_ct_abs_i32 1s 4s -75%
mld_keccakf1600_extract_bytes 1s 3s -67%
mld_value_barrier_u8 1s 1s +0%
pointwise_native_x86_64 1s 4s -75%
poly_caddq_native 1s 1s +0%
poly_uniform_gamma1_4x 1s 2s -50%
polyvecl_pointwise_acc_montgomery_native 1s 3s -67%
shake256_squeeze 1s 1s +0%
sk_t0hat_get_poly 1s 2s -50%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented May 2, 2026

CBMC Results (ML-DSA-65, REDUCE-RAM)

Full Results (200 proofs)
Proof Status Current Previous Change
**TOTAL** 1503s 1737s -13.5%
poly_pointwise_montgomery_c 159s 201s -21%
rej_uniform_native 101s 116s -13%
mld_invntt_layer 98s 116s -16%
mld_ct_memcmp 72s 81s -11%
polyvec_matrix_pointwise_montgomery_yvec 71s 80s -11%
mld_ntt_layer 43s 45s -4%
mld_attempt_signature_generation 31s 30s +3%
fqmul 28s 30s -7%
sign_verify_internal 24s 125s -81%
sign_keypair_internal 21s 22s -5%
keccakf1600x4_permute_native 20s 23s -13%
rej_uniform 20s 20s +0%
polyveck_decompose 18s 18s +0%
rej_uniform_c 18s 19s -5%
mld_ntt_butterfly_block 17s 17s +0%
sign_pk_from_sk 17s 14s +21%
polyveck_power2round 14s 17s -18%
poly_chknorm_c 13s 14s -7%
poly_uniform_eta_4x 13s 11s +18%
polyvec_matrix_pointwise_montgomery_row 13s 15s -13%
mld_check_pct 12s 12s +0%
polyt0_unpack 12s 10s +20%
polyveck_add 12s 12s +0%
poly_add 10s 11s -9%
polyveck_chknorm 10s 7s +43%
keccak_absorb_once_x4 9s 8s +12%
poly_invntt_tomont_c 9s 8s +12%
polyveck_caddq 9s 8s +12%
polyveck_invntt_tomont 9s 8s +12%
polyveck_pointwise_poly_montgomery 9s 8s +12%
polyvecl_chknorm 9s 9s +0%
keccakf1600_permute_native 8s 9s -11%
poly_caddq_c 8s 9s -11%
polyveck_use_hint 8s 12s -33%
keccak_absorb 7s 9s -22%
keccakf1600_permute 7s 7s +0%
mld_compute_pack_z 7s 6s +17%
pointwise_acc_native_x86_64 7s 8s -12%
polyveck_sub 7s 6s +17%
sign 7s 9s -22%
sign_signature_internal 7s 6s +17%
sign_verify_pre_hash_shake256 7s 4s +75%
poly_power2round 6s 7s -14%
polyveck_ntt 6s 6s +0%
polyveck_unpack_t0 6s 3s +100%
polyvecl_ntt 6s 9s -33%
sign_signature_pre_hash_shake256 6s 4s +50%
sign_verify 6s 4s +50%
keccak_squeezeblocks_x4 5s 4s +25%
mld_prepare_domain_separation_prefix 5s 6s -17%
mld_sample_s1_s2_serial 5s 3s +67%
poly_invntt_tomont 5s 4s +25%
poly_uniform_eta 5s 4s +25%
polyt0_pack 5s 5s +0%
polyvec_matrix_pointwise_montgomery 5s 6s -17%
polyveck_reduce 5s 5s +0%
polyvecl_unpack_eta 5s 6s -17%
polyz_unpack_c 5s 4s +25%
sig_unpack_hints 5s - new
sk_s2hat_get_poly 5s 6s -17%
use_hint 5s 4s +25%
intt_native_x86_64 4s 5s -20%
keccak_f1600_x4_native_avx2 4s 3s +33%
mld_h 4s 5s -20%
mld_sample_s1_s2 4s 5s -20%
ntt_native_aarch64 4s 4s +0%
ntt_native_x86_64 4s 3s +33%
pack_sig_z 4s 1s +300%
pack_sk_rho_key_tr_s2_t0 4s 2s +100%
pack_sk_s1 4s 3s +33%
pointwise_acc_native_aarch64 4s 7s -43%
pointwise_native_aarch64 4s 4s +0%
poly_caddq_native 4s 3s +33%
poly_challenge 4s 6s -33%
poly_decompose 4s 2s +100%
poly_decompose_c 4s 5s -20%
poly_invntt_tomont_native 4s 5s -20%
poly_ntt_c 4s 4s +0%
poly_ntt_native 4s 3s +33%
poly_permute_bitrev_to_custom_optional 4s 2s +100%
poly_permute_bitrev_to_custom_optional_native 4s 3s +33%
poly_reduce 4s 3s +33%
poly_shiftl 4s 5s -20%
poly_use_hint 4s 3s +33%
poly_use_hint_c 4s 4s +0%
polyt1_unpack 4s 2s +100%
polyveck_pack_eta 4s 4s +0%
polyveck_shiftl 4s 5s -20%
polyvecl_pointwise_acc_montgomery_c 4s 3s +33%
polyz_unpack_native 4s 4s +0%
rej_eta_native 4s 4s +0%
shake128_finalize 4s 3s +33%
shake256_finalize 4s 2s +100%
shake256x4_absorb_once 4s 1s +300%
sign_open 4s 3s +33%
sign_signature 4s 8s -50%
sign_signature_pre_hash_internal 4s 5s -20%
sk_s1hat_get_poly 4s 2s +100%
sys_check_capability 4s 3s +33%
caddq 3s 2s +50%
decompose 3s 2s +50%
fqscale 3s 4s -25%
keccak_f1600_x1_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 3s +0%
keccak_squeeze 3s 4s -25%
keccakf1600_extract_bytes (big endian) 3s 2s +50%
keccakf1600_xor_bytes 3s 2s +50%
keccakf1600x4_permute 3s 3s +0%
keccakf1600x4_xor_bytes 3s 2s +50%
make_hint 3s 3s +0%
mld_ct_cmask_neg_i32 3s 2s +50%
mld_ct_get_optblocker_i64 3s 3s +0%
mld_ct_get_optblocker_u8 3s 3s +0%
mld_ct_sel_int32 3s 2s +50%
mld_polymat_expand_entry 3s 3s +0%
mld_value_barrier_i64 3s 2s +50%
mld_value_barrier_u32 3s 2s +50%
montgomery_reduce 3s 3s +0%
nttunpack_native_x86_64 3s 5s -40%
pack_pk 3s 5s -40%
pack_sig_h_poly 3s 2s +50%
poly_caddq_native_aarch64 3s 3s +0%
poly_chknorm 3s 6s -50%
poly_chknorm_native 3s 4s -25%
poly_decompose_native 3s 3s +0%
poly_make_hint 3s 3s +0%
poly_ntt 3s 2s +50%
poly_pointwise_montgomery 3s 5s -40%
poly_pointwise_montgomery_native 3s 3s +0%
poly_sub 3s 3s +0%
poly_uniform_4x 3s 4s -25%
poly_uniform_gamma1 3s 4s -25%
poly_use_hint_native 3s 5s -40%
polyeta_unpack 3s 3s +0%
polyvec_matrix_expand 3s 1s +200%
polyvec_matrix_expand_serial 3s 3s +0%
polyveck_pack_t0 3s 3s +0%
polyveck_pack_w1 3s 4s -25%
polyvecl_pack_eta 3s 3s +0%
polyvecl_pointwise_acc_montgomery 3s 2s +50%
polyvecl_uniform_gamma1 3s 3s +0%
polyvecl_uniform_gamma1_serial 3s 2s +50%
polyvecl_unpack_z 3s 4s -25%
polyw1_pack 3s 3s +0%
polyz_unpack_19_native_aarch64 3s 3s +0%
power2round 3s 3s +0%
rej_eta_c 3s 4s -25%
shake128_absorb 3s 4s -25%
shake256 3s 4s -25%
shake256_release 3s 2s +50%
shake256_squeeze 3s 1s +200%
shake256x4_squeezeblocks 3s 2s +50%
sign_keypair 3s 3s +0%
sign_signature_extmu 3s 4s -25%
sign_verify_extmu 3s 4s -25%
sign_verify_pre_hash_internal 3s 3s +0%
sk_t0hat_get_poly 3s 5s -40%
yvec_get_poly 3s 4s -25%
keccak_f1600_x1_native_aarch64 2s 4s -50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 3s -33%
keccak_init 2s 4s -50%
mld_ct_abs_i32 2s 2s +0%
mld_ct_cmask_nonzero_u32 2s 3s -33%
mld_ct_cmask_nonzero_u8 2s 2s +0%
mld_ct_get_optblocker_u32 2s 3s -33%
mld_keccakf1600_extract_bytes 2s 2s +0%
pack_sig_c 2s 3s -33%
poly_caddq 2s 3s -33%
poly_chknorm_native_aarch64 2s 4s -50%
poly_uniform_gamma1_4x 2s 7s -71%
poly_use_hint_native_aarch64 2s 2s +0%
polyeta_pack 2s 3s -33%
polyvecl_pointwise_acc_montgomery_native 2s 3s -33%
polyz_pack 2s 3s -33%
polyz_unpack_17_native_aarch64 2s 3s -33%
reduce32 2s 2s +0%
rej_eta 2s 3s -33%
shake128_squeeze 2s 3s -33%
shake128x4_squeezeblocks 2s 2s +0%
shake256_init 2s 3s -33%
unpack_pk_t1 2s - new
unpack_sk 2s 3s -33%
unpack_sk_s1hat 2s 1s +100%
unpack_sk_s2hat 2s 3s -33%
unpack_sk_t0hat 2s 2s +0%
yvec_init 2s 2s +0%
keccak_finalize 1s 3s -67%
keccakf1600_xor_bytes (big endian) 1s 2s -50%
keccakf1600x4_extract_bytes 1s 2s -50%
mld_value_barrier_u8 1s 2s -50%
pointwise_native_x86_64 1s 2s -50%
poly_uniform 1s 6s -83%
polyt1_pack 1s 2s -50%
polyveck_unpack_eta 1s 3s -67%
polyz_unpack 1s 2s -50%
shake128_init 1s 3s -67%
shake128_release 1s 4s -75%
shake128x4_absorb_once 1s 2s -50%
shake256_absorb 1s 3s -67%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented May 2, 2026

CBMC Results (ML-DSA-87, REDUCE-RAM)

Full Results (200 proofs)
Proof Status Current Previous Change
**TOTAL** 1605s 1808s -11.2%
poly_pointwise_montgomery_c 166s 169s -2%
polyvec_matrix_pointwise_montgomery_yvec 138s 140s -1%
rej_uniform_native 108s 105s +3%
mld_invntt_layer 106s 105s +1%
mld_ct_memcmp 71s 73s -3%
sign_verify_internal 43s 238s -82%
mld_ntt_layer 40s 43s -7%
sign_keypair_internal 34s 33s +3%
fqmul 27s 27s +0%
mld_attempt_signature_generation 27s 23s +17%
keccakf1600x4_permute_native 22s 22s +0%
rej_uniform 21s 24s -12%
sign_pk_from_sk 19s 18s +6%
rej_uniform_c 17s 17s +0%
mld_ntt_butterfly_block 16s 15s +7%
polyeta_unpack 16s 20s -20%
mld_check_pct 13s 12s +8%
poly_chknorm_c 13s 14s -7%
poly_uniform_eta_4x 12s 13s -8%
polyvec_matrix_pointwise_montgomery_row 12s 13s -8%
polyveck_decompose 12s 15s -20%
polyt0_unpack 11s 14s -21%
poly_add 10s 13s -23%
polyveck_ntt 10s 8s +25%
keccak_absorb_once_x4 9s 10s -10%
poly_power2round 9s 8s +12%
polyveck_add 9s 9s +0%
polyveck_pointwise_poly_montgomery 9s 9s +0%
polyveck_power2round 9s 8s +12%
keccakf1600_permute 8s 7s +14%
mld_sample_s1_s2 8s 6s +33%
poly_caddq_c 8s 8s +0%
poly_invntt_tomont_c 8s 9s -11%
polyveck_invntt_tomont 8s 9s -11%
polyvecl_ntt 8s 3s +167%
keccak_absorb 7s 7s +0%
pointwise_acc_native_aarch64 7s 8s -12%
polyveck_chknorm 7s 7s +0%
polyveck_shiftl 7s 7s +0%
polyveck_unpack_eta 7s 4s +75%
polyveck_use_hint 7s 8s -12%
keccak_squeezeblocks_x4 6s 5s +20%
keccakf1600_permute_native 6s 7s -14%
mld_compute_pack_z 6s 6s +0%
mld_ct_get_optblocker_u32 6s 3s +100%
mld_prepare_domain_separation_prefix 6s 3s +100%
mld_sample_s1_s2_serial 6s 6s +0%
pointwise_acc_native_x86_64 6s 6s +0%
polyvec_matrix_pointwise_montgomery 6s 7s -14%
polyveck_caddq 6s 5s +20%
polyveck_sub 6s 7s -14%
polyvecl_chknorm 6s 6s +0%
rej_eta_native 6s 4s +50%
shake128_release 6s 1s +500%
sign 6s 7s -14%
sign_signature_internal 6s 4s +50%
keccak_finalize 5s 2s +150%
mld_ct_cmask_nonzero_u32 5s 2s +150%
nttunpack_native_x86_64 5s 3s +67%
pack_sig_z 5s 4s +25%
pack_sk_rho_key_tr_s2_t0 5s 5s +0%
poly_decompose_c 5s 5s +0%
poly_shiftl 5s 7s -29%
poly_use_hint_c 5s 5s +0%
polyt1_unpack 5s 3s +67%
polyveck_pack_eta 5s 2s +150%
polyveck_reduce 5s 5s +0%
polyz_unpack_c 5s 3s +67%
sk_s2hat_get_poly 5s 3s +67%
caddq 4s 2s +100%
keccak_init 4s 3s +33%
keccakf1600_xor_bytes (big endian) 4s 2s +100%
keccakf1600x4_xor_bytes 4s 4s +0%
mld_ct_sel_int32 4s 2s +100%
ntt_native_aarch64 4s 2s +100%
pack_sig_c 4s 3s +33%
poly_challenge 4s 4s +0%
poly_chknorm_native 4s 3s +33%
poly_invntt_tomont 4s 1s +300%
poly_invntt_tomont_native 4s 1s +300%
poly_uniform 4s 4s +0%
poly_uniform_4x 4s 4s +0%
poly_uniform_gamma1_4x 4s 2s +100%
polyeta_pack 4s 5s -20%
polyveck_pack_t0 4s 3s +33%
polyvecl_unpack_z 4s 2s +100%
rej_eta_c 4s 4s +0%
shake128_init 4s 4s +0%
sig_unpack_hints 4s - new
sign_signature 4s 2s +100%
sign_signature_pre_hash_internal 4s 4s +0%
sign_verify 4s 5s -20%
sign_verify_pre_hash_internal 4s 3s +33%
unpack_pk_t1 4s - new
unpack_sk_s2hat 4s 3s +33%
unpack_sk_t0hat 4s 4s +0%
use_hint 4s 4s +0%
fqscale 3s 1s +200%
keccak_f1600_x1_native_aarch64_v84a 3s 1s +200%
keccak_f1600_x4_native_aarch64_v84a 3s 4s -25%
make_hint 3s 4s -25%
mld_ct_abs_i32 3s 1s +200%
mld_h 3s 4s -25%
mld_polymat_expand_entry 3s 4s -25%
mld_value_barrier_i64 3s 2s +50%
mld_value_barrier_u32 3s 1s +200%
pack_pk 3s 2s +50%
pack_sig_h_poly 3s 3s +0%
pointwise_native_aarch64 3s 2s +50%
poly_caddq 3s 3s +0%
poly_chknorm 3s 1s +200%
poly_decompose 3s 2s +50%
poly_decompose_native 3s 3s +0%
poly_ntt_c 3s 5s -40%
poly_ntt_native 3s 3s +0%
poly_permute_bitrev_to_custom_optional 3s 2s +50%
poly_uniform_eta 3s 4s -25%
poly_use_hint_native 3s 4s -25%
polyvec_matrix_expand 3s 3s +0%
polyvec_matrix_expand_serial 3s 2s +50%
polyveck_pack_w1 3s 3s +0%
polyveck_unpack_t0 3s 3s +0%
polyvecl_pointwise_acc_montgomery 3s 3s +0%
polyvecl_pointwise_acc_montgomery_native 3s 3s +0%
polyvecl_uniform_gamma1_serial 3s 2s +50%
polyvecl_unpack_eta 3s 3s +0%
polyw1_pack 3s 1s +200%
polyz_pack 3s 5s -40%
polyz_unpack_19_native_aarch64 3s 3s +0%
reduce32 3s 4s -25%
shake128_squeeze 3s 2s +50%
shake256_finalize 3s 5s -40%
shake256_squeeze 3s 3s +0%
sign_keypair 3s 4s -25%
sign_signature_pre_hash_shake256 3s 4s -25%
sign_verify_extmu 3s 7s -57%
sign_verify_pre_hash_shake256 3s 6s -50%
sk_s1hat_get_poly 3s 2s +50%
sk_t0hat_get_poly 3s 3s +0%
unpack_sk 3s 2s +50%
yvec_init 3s 4s -25%
decompose 2s 2s +0%
intt_native_x86_64 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_avx2 2s 4s -50%
keccak_squeeze 2s 2s +0%
keccakf1600_extract_bytes (big endian) 2s 3s -33%
keccakf1600_xor_bytes 2s 4s -50%
keccakf1600x4_permute 2s 3s -33%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_get_optblocker_i64 2s 1s +100%
mld_ct_get_optblocker_u8 2s 2s +0%
mld_value_barrier_u8 2s 2s +0%
montgomery_reduce 2s 3s -33%
ntt_native_x86_64 2s 2s +0%
pack_sk_s1 2s 2s +0%
pointwise_native_x86_64 2s 3s -33%
poly_caddq_native 2s 5s -60%
poly_caddq_native_aarch64 2s 2s +0%
poly_chknorm_native_aarch64 2s 2s +0%
poly_make_hint 2s 3s -33%
poly_ntt 2s 2s +0%
poly_permute_bitrev_to_custom_optional_native 2s 3s -33%
poly_pointwise_montgomery 2s 3s -33%
poly_reduce 2s 4s -50%
poly_sub 2s 2s +0%
poly_use_hint 2s 2s +0%
poly_use_hint_native_aarch64 2s 3s -33%
polyt0_pack 2s 5s -60%
polyt1_pack 2s 1s +100%
polyvecl_pack_eta 2s 3s -33%
polyvecl_pointwise_acc_montgomery_c 2s 3s -33%
polyvecl_uniform_gamma1 2s 2s +0%
polyz_unpack 2s 1s +100%
polyz_unpack_17_native_aarch64 2s 4s -50%
polyz_unpack_native 2s 5s -60%
power2round 2s 2s +0%
rej_eta 2s 3s -33%
shake128x4_absorb_once 2s 2s +0%
shake256 2s 1s +100%
shake256_absorb 2s 3s -33%
shake256_init 2s 4s -50%
shake256_release 2s 1s +100%
shake256x4_absorb_once 2s 3s -33%
sign_open 2s 2s +0%
sign_signature_extmu 2s 2s +0%
sys_check_capability 2s 2s +0%
unpack_sk_s1hat 2s 4s -50%
yvec_get_poly 2s 2s +0%
keccak_f1600_x1_native_aarch64 1s 3s -67%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 1s +0%
keccakf1600x4_extract_bytes 1s 4s -75%
mld_ct_cmask_nonzero_u8 1s 2s -50%
mld_keccakf1600_extract_bytes 1s 3s -67%
poly_pointwise_montgomery_native 1s 2s -50%
poly_uniform_gamma1 1s 4s -75%
shake128_absorb 1s 3s -67%
shake128_finalize 1s 3s -67%
shake128x4_squeezeblocks 1s 4s -75%
shake256x4_squeezeblocks 1s 2s -50%

@mkannwischer mkannwischer force-pushed the verify-buffer-sharing branch 2 times, most recently from bb6f745 to ea14d62 Compare May 2, 2026 13:15
@mkannwischer mkannwischer marked this pull request as ready for review May 2, 2026 13:16
@mkannwischer mkannwischer force-pushed the verify-buffer-sharing branch from ea14d62 to 88e1324 Compare May 3, 2026 00:36
Comment thread mldsa/src/sign.c
Comment thread mldsa/src/sign.c Outdated
Comment thread mldsa/src/sign.c Outdated
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Details
Benchmark suite Current: 740c4bb Previous: 339782e Ratio
ML-DSA-44 keypair 310360 cycles 329877 cycles 0.94
ML-DSA-44 sign 1176137 cycles 1224108 cycles 0.96
ML-DSA-44 verify 337367 cycles 348071 cycles 0.97
ML-DSA-65 keypair 560473 cycles 565199 cycles 0.99
ML-DSA-65 sign 1907791 cycles 1930855 cycles 0.99
ML-DSA-65 verify 535703 cycles 545481 cycles 0.98
ML-DSA-87 keypair 872107 cycles 851474 cycles 1.02
ML-DSA-87 sign 2487117 cycles 2383353 cycles 1.04
ML-DSA-87 verify 878306 cycles 873058 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Graviton4 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 740c4bb Previous: 339782e Ratio
ML-DSA-87 verify 385265 cycles 373790 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 740c4bb Previous: 339782e Ratio
ML-DSA-87 sign 2487117 cycles 2383353 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Graviton3 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 740c4bb Previous: 339782e Ratio
ML-DSA-87 verify 429474 cycles 402939 cycles 1.07

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Details
Benchmark suite Current: 740c4bb Previous: 339782e Ratio
ML-DSA-44 keypair 459483 cycles 459447 cycles 1.00
ML-DSA-44 sign 2133292 cycles 2132638 cycles 1.00
ML-DSA-44 verify 544400 cycles 547432 cycles 0.99
ML-DSA-65 keypair 772643 cycles 771917 cycles 1.00
ML-DSA-65 sign 3477028 cycles 3470907 cycles 1.00
ML-DSA-65 verify 847003 cycles 850045 cycles 1.00
ML-DSA-87 keypair 1248446 cycles 1247649 cycles 1.00
ML-DSA-87 sign 4301115 cycles 4322692 cycles 1.00
ML-DSA-87 verify 1358980 cycles 1368274 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Mac Mini (M1, 2020) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 17733c0 Previous: 339782e Ratio
ML-DSA-65 verify 82838 cycles 79828 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Details
Benchmark suite Current: 17733c0 Previous: 339782e Ratio
ML-DSA-44 keypair 212266 cycles 225432 cycles 0.94
ML-DSA-44 sign 599827 cycles 631252 cycles 0.95
ML-DSA-44 verify 212113 cycles 222128 cycles 0.95
ML-DSA-65 keypair 383924 cycles 391237 cycles 0.98
ML-DSA-65 sign 1006349 cycles 1008674 cycles 1.00
ML-DSA-65 verify 370840 cycles 370666 cycles 1.00
ML-DSA-87 keypair 650184 cycles 665032 cycles 0.98
ML-DSA-87 sign 1361289 cycles 1398865 cycles 0.97
ML-DSA-87 verify 628257 cycles 636893 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Details
Benchmark suite Current: 17733c0 Previous: 339782e Ratio
ML-DSA-44 keypair 309376 cycles 329877 cycles 0.94
ML-DSA-44 sign 1190695 cycles 1224108 cycles 0.97
ML-DSA-44 verify 330999 cycles 348071 cycles 0.95
ML-DSA-65 keypair 571739 cycles 565199 cycles 1.01
ML-DSA-65 sign 1937242 cycles 1930855 cycles 1.00
ML-DSA-65 verify 558280 cycles 545481 cycles 1.02
ML-DSA-87 keypair 876657 cycles 851474 cycles 1.03
ML-DSA-87 sign 2435835 cycles 2383353 cycles 1.02
ML-DSA-87 verify 906876 cycles 873058 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 17733c0 Previous: 339782e Ratio
ML-DSA-87 verify 906876 cycles 873058 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Details
Benchmark suite Current: 17733c0 Previous: 339782e Ratio
ML-DSA-44 keypair 460079 cycles 459447 cycles 1.00
ML-DSA-44 sign 2137097 cycles 2132638 cycles 1.00
ML-DSA-44 verify 545716 cycles 547432 cycles 1.00
ML-DSA-65 keypair 772851 cycles 771917 cycles 1.00
ML-DSA-65 sign 3477329 cycles 3470907 cycles 1.00
ML-DSA-65 verify 848550 cycles 850045 cycles 1.00
ML-DSA-87 keypair 1248048 cycles 1247649 cycles 1.00
ML-DSA-87 sign 4309673 cycles 4322692 cycles 1.00
ML-DSA-87 verify 1357941 cycles 1368274 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Intel Xeon 4th gen (c7i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: b50e8d6 Previous: 339782e Ratio
ML-DSA-87 verify 149579 cycles 128257 cycles 1.17

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Graviton4 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: b50e8d6 Previous: 339782e Ratio
ML-DSA-87 verify 385420 cycles 373790 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Graviton3 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: b50e8d6 Previous: 339782e Ratio
ML-DSA-87 verify 429522 cycles 402939 cycles 1.07

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Details
Benchmark suite Current: b50e8d6 Previous: 339782e Ratio
ML-DSA-44 keypair 224513 cycles 225432 cycles 1.00
ML-DSA-44 sign 653904 cycles 631252 cycles 1.04
ML-DSA-44 verify 228026 cycles 222128 cycles 1.03
ML-DSA-65 keypair 401530 cycles 391237 cycles 1.03
ML-DSA-65 sign 1041844 cycles 1008674 cycles 1.03
ML-DSA-65 verify 376898 cycles 370666 cycles 1.02
ML-DSA-87 keypair 644118 cycles 665032 cycles 0.97
ML-DSA-87 sign 1352613 cycles 1398865 cycles 0.97
ML-DSA-87 verify 626459 cycles 636893 cycles 0.98

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: b50e8d6 Previous: 339782e Ratio
ML-DSA-44 sign 653904 cycles 631252 cycles 1.04
ML-DSA-65 sign 1041844 cycles 1008674 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Details
Benchmark suite Current: b50e8d6 Previous: 339782e Ratio
ML-DSA-44 keypair 300757 cycles 329877 cycles 0.91
ML-DSA-44 sign 1154134 cycles 1224108 cycles 0.94
ML-DSA-44 verify 328761 cycles 348071 cycles 0.94
ML-DSA-65 keypair 565075 cycles 565199 cycles 1.00
ML-DSA-65 sign 1922224 cycles 1930855 cycles 1.00
ML-DSA-65 verify 532483 cycles 545481 cycles 0.98
ML-DSA-87 keypair 851416 cycles 851474 cycles 1.00
ML-DSA-87 sign 2424125 cycles 2383353 cycles 1.02
ML-DSA-87 verify 883409 cycles 873058 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Details
Benchmark suite Current: b50e8d6 Previous: 339782e Ratio
ML-DSA-44 keypair 458880 cycles 459447 cycles 1.00
ML-DSA-44 sign 2129039 cycles 2132638 cycles 1.00
ML-DSA-44 verify 544470 cycles 547432 cycles 0.99
ML-DSA-65 keypair 772136 cycles 771917 cycles 1.00
ML-DSA-65 sign 3473186 cycles 3470907 cycles 1.00
ML-DSA-65 verify 847275 cycles 850045 cycles 1.00
ML-DSA-87 keypair 1246443 cycles 1247649 cycles 1.00
ML-DSA-87 sign 4300514 cycles 4322692 cycles 0.99
ML-DSA-87 verify 1359094 cycles 1368274 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@mkannwischer
Copy link
Copy Markdown
Contributor Author

I tried digging into why there is a substantial performance regression for x86_64 in this PR, but I cannot quite figure it out. When I run this PR on a c7i.metal-24xl, I get about the same performance as in main - most of the time. Verify on x86 Sapphire Rapids seems to behave bimodally - most runs land in a fast band, ~6-12% land in a slow band that's 10-18% slower. As far as I can tell it's true on main as well as on this branch, at similar rates. It seems to correlate with where stack buffers happen to land mod 4096, which varies per run because of ASLR.

If I align w1 to 4096 bytes, the performance regression goes away in CI, but I don't know how that can help us with fixing it.

I tried disabling ASLR (see #1091). That indeed eliminates the variance for me and it definitely also impacts performance in CI, but it doesn't result in the performance regression in this PR to go away. Also, it does not allow to reproduce the performance regression locally.

My initial hypothesis was that this is x86's 4 KiB store-buffer aliasing: when a load has the same low-12 address bits as a recently-issued store, x86_64 flags a false dependency and delays the load. That model fits the basic shape: stack addresses varies with ASLR while rodata low-12 is fixed within the binary, so you'd expect the alias to fire when the stack happens to land at a low-12 that overlaps a hot rodata access. perf record -e ld_blocks.address_alias is consistent with that. But when I tried to confirm by shifting the suspected victims, things didn't fall into place.

My conclusion is that this PR is likely not introducing the regression. Still we should investigate why ASLR has such a massive impact on our performance - it really shouldn't. #1091's benchmark show a speed-up for our native implementations just by disabling ASLR suggesting some of our assembly routines suffer proof performance in the case of unlucky stack placement due to ASLR.

@hanno-becker, can we go ahead and merge this PR and investigate this separately?

@hanno-becker
Copy link
Copy Markdown
Contributor

@mkannwischer Thank you for the analysis. I think it gives sufficient confidence that this is a pre-existing issue, not something introduced in this PR. We should analyze it independently. Can you open an issue?

Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving under the assumption that we will investigate the performance degradation on x86 with some urgency.

@mkannwischer mkannwischer merged commit 4ba9580 into main May 4, 2026
873 checks passed
@mkannwischer mkannwischer deleted the verify-buffer-sharing branch May 4, 2026 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants