Skip to content

[TEST] x86_64: 32-byte align stack scratch in rej_uniform and keccak_f1600_x4_avx2#1682

Draft
mkannwischer wants to merge 1 commit into
mainfrom
bench-stack-align
Draft

[TEST] x86_64: 32-byte align stack scratch in rej_uniform and keccak_f1600_x4_avx2#1682
mkannwischer wants to merge 1 commit into
mainfrom
bench-stack-align

Conversation

@mkannwischer
Copy link
Copy Markdown
Contributor

…4_avx2

…4_avx2

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
@mkannwischer mkannwischer added the benchmark this PR should be benchmarked in CI label May 8, 2026
Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 12319 cycles 12319 cycles 1
ML-KEM-512 encaps 14997 cycles 14998 cycles 1.00
ML-KEM-512 decaps 19550 cycles 19550 cycles 1
ML-KEM-768 keypair 21263 cycles 21264 cycles 1.00
ML-KEM-768 encaps 23871 cycles 23869 cycles 1.00
ML-KEM-768 decaps 30415 cycles 30412 cycles 1.00
ML-KEM-1024 keypair 30327 cycles 30328 cycles 1.00
ML-KEM-1024 encaps 34574 cycles 34572 cycles 1.00
ML-KEM-1024 decaps 44190 cycles 44191 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ppc64le (POWER10) benchmarks

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 59099 cycles 59381 cycles 1.00
ML-KEM-512 encaps 71861 cycles 72089 cycles 1.00
ML-KEM-512 decaps 91610 cycles 91897 cycles 1.00
ML-KEM-768 keypair 98389 cycles 99066 cycles 0.99
ML-KEM-768 encaps 114715 cycles 115473 cycles 0.99
ML-KEM-768 decaps 140352 cycles 141058 cycles 0.99
ML-KEM-1024 keypair 148840 cycles 148874 cycles 1.00
ML-KEM-1024 encaps 167867 cycles 167437 cycles 1.00
ML-KEM-1024 decaps 198739 cycles 198787 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 28268 cycles 28220 cycles 1.00
ML-KEM-512 encaps 34110 cycles 34106 cycles 1.00
ML-KEM-512 decaps 44365 cycles 44333 cycles 1.00
ML-KEM-768 keypair 47685 cycles 47614 cycles 1.00
ML-KEM-768 encaps 53901 cycles 53939 cycles 1.00
ML-KEM-768 decaps 68353 cycles 68365 cycles 1.00
ML-KEM-1024 keypair 70249 cycles 70253 cycles 1.00
ML-KEM-1024 encaps 78724 cycles 78729 cycles 1.00
ML-KEM-1024 decaps 98421 cycles 98443 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 59768 cycles 59768 cycles 1
ML-KEM-512 encaps 67518 cycles 67522 cycles 1.00
ML-KEM-512 decaps 86122 cycles 86164 cycles 1.00
ML-KEM-768 keypair 97418 cycles 97432 cycles 1.00
ML-KEM-768 encaps 110932 cycles 111015 cycles 1.00
ML-KEM-768 decaps 137595 cycles 138432 cycles 0.99
ML-KEM-1024 keypair 155075 cycles 154655 cycles 1.00
ML-KEM-1024 encaps 172548 cycles 171560 cycles 1.01
ML-KEM-1024 decaps 209918 cycles 208191 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 11707 cycles 12024 cycles 0.97
ML-KEM-512 encaps 13320 cycles 13792 cycles 0.97
ML-KEM-512 decaps 17493 cycles 17799 cycles 0.98
ML-KEM-768 keypair 20188 cycles 21058 cycles 0.96
ML-KEM-768 encaps 21749 cycles 21954 cycles 0.99
ML-KEM-768 decaps 28137 cycles 27947 cycles 1.01
ML-KEM-1024 keypair 28902 cycles 29875 cycles 0.97
ML-KEM-1024 encaps 30948 cycles 31692 cycles 0.98
ML-KEM-1024 decaps 38541 cycles 39396 cycles 0.98

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 13945 cycles 14250 cycles 0.98
ML-KEM-512 encaps 15670 cycles 15974 cycles 0.98
ML-KEM-512 decaps 21232 cycles 21545 cycles 0.99
ML-KEM-768 keypair 23720 cycles 24733 cycles 0.96
ML-KEM-768 encaps 25126 cycles 25462 cycles 0.99
ML-KEM-768 decaps 32990 cycles 33345 cycles 0.99
ML-KEM-1024 keypair 33326 cycles 37143 cycles 0.90
ML-KEM-1024 encaps 35786 cycles 36842 cycles 0.97
ML-KEM-1024 decaps 46364 cycles 46735 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 28318 cycles 28217 cycles 1.00
ML-KEM-512 encaps 36669 cycles 36694 cycles 1.00
ML-KEM-512 decaps 45169 cycles 45259 cycles 1.00
ML-KEM-768 keypair 46242 cycles 46276 cycles 1.00
ML-KEM-768 encaps 55823 cycles 55744 cycles 1.00
ML-KEM-768 decaps 69894 cycles 69803 cycles 1.00
ML-KEM-1024 keypair 70399 cycles 70421 cycles 1.00
ML-KEM-1024 encaps 82321 cycles 82529 cycles 1.00
ML-KEM-1024 decaps 99243 cycles 99421 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 12666 cycles 12779 cycles 0.99
ML-KEM-512 encaps 14191 cycles 14273 cycles 0.99
ML-KEM-512 decaps 19073 cycles 19121 cycles 1.00
ML-KEM-768 keypair 21876 cycles 22408 cycles 0.98
ML-KEM-768 encaps 22932 cycles 23053 cycles 0.99
ML-KEM-768 decaps 29943 cycles 30058 cycles 1.00
ML-KEM-1024 keypair 30723 cycles 32987 cycles 0.93
ML-KEM-1024 encaps 32776 cycles 33034 cycles 0.99
ML-KEM-1024 decaps 42174 cycles 42393 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 17456 cycles 17543 cycles 1.00
ML-KEM-512 encaps 19789 cycles 19953 cycles 0.99
ML-KEM-512 decaps 26302 cycles 26452 cycles 0.99
ML-KEM-768 keypair 30052 cycles 31153 cycles 0.96
ML-KEM-768 encaps 31008 cycles 31870 cycles 0.97
ML-KEM-768 decaps 41352 cycles 41554 cycles 1.00
ML-KEM-1024 keypair 42287 cycles 43949 cycles 0.96
ML-KEM-1024 encaps 45802 cycles 45348 cycles 1.01
ML-KEM-1024 decaps 60641 cycles 58193 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-1024 decaps 60641 cycles 58193 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 40221 cycles 40252 cycles 1.00
ML-KEM-512 encaps 48434 cycles 48424 cycles 1.00
ML-KEM-512 decaps 62625 cycles 62618 cycles 1.00
ML-KEM-768 keypair 63911 cycles 63706 cycles 1.00
ML-KEM-768 encaps 74924 cycles 75085 cycles 1.00
ML-KEM-768 decaps 93591 cycles 93596 cycles 1.00
ML-KEM-1024 keypair 95291 cycles 95445 cycles 1.00
ML-KEM-1024 encaps 109405 cycles 109672 cycles 1.00
ML-KEM-1024 decaps 132237 cycles 132483 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 36599 cycles 36609 cycles 1.00
ML-KEM-512 encaps 43086 cycles 43073 cycles 1.00
ML-KEM-512 decaps 55714 cycles 55711 cycles 1.00
ML-KEM-768 keypair 58634 cycles 58682 cycles 1.00
ML-KEM-768 encaps 67519 cycles 67521 cycles 1.00
ML-KEM-768 decaps 84506 cycles 84450 cycles 1.00
ML-KEM-1024 keypair 88990 cycles 88997 cycles 1.00
ML-KEM-1024 encaps 99191 cycles 99192 cycles 1.00
ML-KEM-1024 decaps 120576 cycles 120712 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 17674 cycles 17644 cycles 1.00
ML-KEM-512 encaps 20600 cycles 20601 cycles 1.00
ML-KEM-512 decaps 27088 cycles 27077 cycles 1.00
ML-KEM-768 keypair 29919 cycles 29899 cycles 1.00
ML-KEM-768 encaps 32728 cycles 32770 cycles 1.00
ML-KEM-768 decaps 41992 cycles 41964 cycles 1.00
ML-KEM-1024 keypair 43726 cycles 43737 cycles 1.00
ML-KEM-1024 encaps 48783 cycles 48724 cycles 1.00
ML-KEM-1024 decaps 61375 cycles 61394 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 45730 cycles 45760 cycles 1.00
ML-KEM-512 encaps 54307 cycles 54386 cycles 1.00
ML-KEM-512 decaps 69783 cycles 69833 cycles 1.00
ML-KEM-768 keypair 74158 cycles 74214 cycles 1.00
ML-KEM-768 encaps 86001 cycles 86131 cycles 1.00
ML-KEM-768 decaps 106669 cycles 106676 cycles 1.00
ML-KEM-1024 keypair 112035 cycles 112265 cycles 1.00
ML-KEM-1024 encaps 124570 cycles 124792 cycles 1.00
ML-KEM-1024 decaps 150493 cycles 150765 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 18657 cycles 18641 cycles 1.00
ML-KEM-512 encaps 21879 cycles 21876 cycles 1.00
ML-KEM-512 decaps 28879 cycles 28864 cycles 1.00
ML-KEM-768 keypair 31590 cycles 31540 cycles 1.00
ML-KEM-768 encaps 34744 cycles 34772 cycles 1.00
ML-KEM-768 decaps 44829 cycles 44778 cycles 1.00
ML-KEM-1024 keypair 46074 cycles 46077 cycles 1.00
ML-KEM-1024 encaps 51509 cycles 51491 cycles 1.00
ML-KEM-1024 decaps 65018 cycles 65024 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 35450 cycles 35412 cycles 1.00
ML-KEM-512 encaps 40089 cycles 40110 cycles 1.00
ML-KEM-512 decaps 51097 cycles 51135 cycles 1.00
ML-KEM-768 keypair 56740 cycles 56668 cycles 1.00
ML-KEM-768 encaps 64546 cycles 65154 cycles 0.99
ML-KEM-768 decaps 79370 cycles 79301 cycles 1.00
ML-KEM-1024 keypair 87849 cycles 87865 cycles 1.00
ML-KEM-1024 encaps 97111 cycles 96879 cycles 1.00
ML-KEM-1024 decaps 115955 cycles 115831 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 38932 cycles 38888 cycles 1.00
ML-KEM-512 encaps 44527 cycles 44594 cycles 1.00
ML-KEM-512 decaps 56593 cycles 56672 cycles 1.00
ML-KEM-768 keypair 62331 cycles 62287 cycles 1.00
ML-KEM-768 encaps 71055 cycles 72308 cycles 0.98
ML-KEM-768 decaps 87343 cycles 87694 cycles 1.00
ML-KEM-1024 keypair 96210 cycles 96151 cycles 1.00
ML-KEM-1024 encaps 106366 cycles 106136 cycles 1.00
ML-KEM-1024 decaps 126790 cycles 126586 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 50755 cycles 51363 cycles 0.99
ML-KEM-512 encaps 58520 cycles 59331 cycles 0.99
ML-KEM-512 decaps 74875 cycles 75552 cycles 0.99
ML-KEM-768 keypair 85665 cycles 86501 cycles 0.99
ML-KEM-768 encaps 93646 cycles 94937 cycles 0.99
ML-KEM-768 decaps 117947 cycles 118639 cycles 0.99
ML-KEM-1024 keypair 130412 cycles 131021 cycles 1.00
ML-KEM-1024 encaps 142041 cycles 144288 cycles 0.98
ML-KEM-1024 decaps 174697 cycles 174365 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 28265 cycles 28269 cycles 1.00
ML-KEM-512 encaps 34166 cycles 34121 cycles 1.00
ML-KEM-512 decaps 44400 cycles 44377 cycles 1.00
ML-KEM-768 keypair 47653 cycles 47670 cycles 1.00
ML-KEM-768 encaps 53998 cycles 53906 cycles 1.00
ML-KEM-768 decaps 68424 cycles 68360 cycles 1.00
ML-KEM-1024 keypair 70367 cycles 70258 cycles 1.00
ML-KEM-1024 encaps 78755 cycles 78748 cycles 1.00
ML-KEM-1024 decaps 98551 cycles 98442 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 59139 cycles 59143 cycles 1.00
ML-KEM-512 encaps 68601 cycles 68640 cycles 1.00
ML-KEM-512 decaps 87321 cycles 87348 cycles 1.00
ML-KEM-768 keypair 95328 cycles 95292 cycles 1.00
ML-KEM-768 encaps 110308 cycles 109839 cycles 1.00
ML-KEM-768 decaps 134544 cycles 134315 cycles 1.00
ML-KEM-1024 keypair 148015 cycles 147969 cycles 1.00
ML-KEM-1024 encaps 163853 cycles 163829 cycles 1.00
ML-KEM-1024 decaps 195602 cycles 195492 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks

Details
Benchmark suite Current: 3b7bde4 Previous: 2bf8e59 Ratio
ML-KEM-512 keypair 155506 cycles 155502 cycles 1.00
ML-KEM-512 encaps 163404 cycles 163410 cycles 1.00
ML-KEM-512 decaps 206694 cycles 206683 cycles 1.00
ML-KEM-768 keypair 249899 cycles 249923 cycles 1.00
ML-KEM-768 encaps 270415 cycles 270402 cycles 1.00
ML-KEM-768 decaps 332822 cycles 332795 cycles 1.00
ML-KEM-1024 keypair 395733 cycles 395710 cycles 1.00
ML-KEM-1024 encaps 422748 cycles 422751 cycles 1.00
ML-KEM-1024 decaps 506154 cycles 506349 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark this PR should be benchmarked in CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants