[TEST] x86_64: 32-byte align stack scratch in rej_uniform and keccak_f1600_x4_avx2#1682
[TEST] x86_64: 32-byte align stack scratch in rej_uniform and keccak_f1600_x4_avx2#1682mkannwischer wants to merge 1 commit into
Conversation
…4_avx2 Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
oqs-bot
left a comment
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
12319 cycles |
12319 cycles |
1 |
ML-KEM-512 encaps |
14997 cycles |
14998 cycles |
1.00 |
ML-KEM-512 decaps |
19550 cycles |
19550 cycles |
1 |
ML-KEM-768 keypair |
21263 cycles |
21264 cycles |
1.00 |
ML-KEM-768 encaps |
23871 cycles |
23869 cycles |
1.00 |
ML-KEM-768 decaps |
30415 cycles |
30412 cycles |
1.00 |
ML-KEM-1024 keypair |
30327 cycles |
30328 cycles |
1.00 |
ML-KEM-1024 encaps |
34574 cycles |
34572 cycles |
1.00 |
ML-KEM-1024 decaps |
44190 cycles |
44191 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
ppc64le (POWER10) benchmarks
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
59099 cycles |
59381 cycles |
1.00 |
ML-KEM-512 encaps |
71861 cycles |
72089 cycles |
1.00 |
ML-KEM-512 decaps |
91610 cycles |
91897 cycles |
1.00 |
ML-KEM-768 keypair |
98389 cycles |
99066 cycles |
0.99 |
ML-KEM-768 encaps |
114715 cycles |
115473 cycles |
0.99 |
ML-KEM-768 decaps |
140352 cycles |
141058 cycles |
0.99 |
ML-KEM-1024 keypair |
148840 cycles |
148874 cycles |
1.00 |
ML-KEM-1024 encaps |
167867 cycles |
167437 cycles |
1.00 |
ML-KEM-1024 decaps |
198739 cycles |
198787 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
28268 cycles |
28220 cycles |
1.00 |
ML-KEM-512 encaps |
34110 cycles |
34106 cycles |
1.00 |
ML-KEM-512 decaps |
44365 cycles |
44333 cycles |
1.00 |
ML-KEM-768 keypair |
47685 cycles |
47614 cycles |
1.00 |
ML-KEM-768 encaps |
53901 cycles |
53939 cycles |
1.00 |
ML-KEM-768 decaps |
68353 cycles |
68365 cycles |
1.00 |
ML-KEM-1024 keypair |
70249 cycles |
70253 cycles |
1.00 |
ML-KEM-1024 encaps |
78724 cycles |
78729 cycles |
1.00 |
ML-KEM-1024 decaps |
98421 cycles |
98443 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
59768 cycles |
59768 cycles |
1 |
ML-KEM-512 encaps |
67518 cycles |
67522 cycles |
1.00 |
ML-KEM-512 decaps |
86122 cycles |
86164 cycles |
1.00 |
ML-KEM-768 keypair |
97418 cycles |
97432 cycles |
1.00 |
ML-KEM-768 encaps |
110932 cycles |
111015 cycles |
1.00 |
ML-KEM-768 decaps |
137595 cycles |
138432 cycles |
0.99 |
ML-KEM-1024 keypair |
155075 cycles |
154655 cycles |
1.00 |
ML-KEM-1024 encaps |
172548 cycles |
171560 cycles |
1.01 |
ML-KEM-1024 decaps |
209918 cycles |
208191 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
11707 cycles |
12024 cycles |
0.97 |
ML-KEM-512 encaps |
13320 cycles |
13792 cycles |
0.97 |
ML-KEM-512 decaps |
17493 cycles |
17799 cycles |
0.98 |
ML-KEM-768 keypair |
20188 cycles |
21058 cycles |
0.96 |
ML-KEM-768 encaps |
21749 cycles |
21954 cycles |
0.99 |
ML-KEM-768 decaps |
28137 cycles |
27947 cycles |
1.01 |
ML-KEM-1024 keypair |
28902 cycles |
29875 cycles |
0.97 |
ML-KEM-1024 encaps |
30948 cycles |
31692 cycles |
0.98 |
ML-KEM-1024 decaps |
38541 cycles |
39396 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
13945 cycles |
14250 cycles |
0.98 |
ML-KEM-512 encaps |
15670 cycles |
15974 cycles |
0.98 |
ML-KEM-512 decaps |
21232 cycles |
21545 cycles |
0.99 |
ML-KEM-768 keypair |
23720 cycles |
24733 cycles |
0.96 |
ML-KEM-768 encaps |
25126 cycles |
25462 cycles |
0.99 |
ML-KEM-768 decaps |
32990 cycles |
33345 cycles |
0.99 |
ML-KEM-1024 keypair |
33326 cycles |
37143 cycles |
0.90 |
ML-KEM-1024 encaps |
35786 cycles |
36842 cycles |
0.97 |
ML-KEM-1024 decaps |
46364 cycles |
46735 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
28318 cycles |
28217 cycles |
1.00 |
ML-KEM-512 encaps |
36669 cycles |
36694 cycles |
1.00 |
ML-KEM-512 decaps |
45169 cycles |
45259 cycles |
1.00 |
ML-KEM-768 keypair |
46242 cycles |
46276 cycles |
1.00 |
ML-KEM-768 encaps |
55823 cycles |
55744 cycles |
1.00 |
ML-KEM-768 decaps |
69894 cycles |
69803 cycles |
1.00 |
ML-KEM-1024 keypair |
70399 cycles |
70421 cycles |
1.00 |
ML-KEM-1024 encaps |
82321 cycles |
82529 cycles |
1.00 |
ML-KEM-1024 decaps |
99243 cycles |
99421 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
12666 cycles |
12779 cycles |
0.99 |
ML-KEM-512 encaps |
14191 cycles |
14273 cycles |
0.99 |
ML-KEM-512 decaps |
19073 cycles |
19121 cycles |
1.00 |
ML-KEM-768 keypair |
21876 cycles |
22408 cycles |
0.98 |
ML-KEM-768 encaps |
22932 cycles |
23053 cycles |
0.99 |
ML-KEM-768 decaps |
29943 cycles |
30058 cycles |
1.00 |
ML-KEM-1024 keypair |
30723 cycles |
32987 cycles |
0.93 |
ML-KEM-1024 encaps |
32776 cycles |
33034 cycles |
0.99 |
ML-KEM-1024 decaps |
42174 cycles |
42393 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
17456 cycles |
17543 cycles |
1.00 |
ML-KEM-512 encaps |
19789 cycles |
19953 cycles |
0.99 |
ML-KEM-512 decaps |
26302 cycles |
26452 cycles |
0.99 |
ML-KEM-768 keypair |
30052 cycles |
31153 cycles |
0.96 |
ML-KEM-768 encaps |
31008 cycles |
31870 cycles |
0.97 |
ML-KEM-768 decaps |
41352 cycles |
41554 cycles |
1.00 |
ML-KEM-1024 keypair |
42287 cycles |
43949 cycles |
0.96 |
ML-KEM-1024 encaps |
45802 cycles |
45348 cycles |
1.01 |
ML-KEM-1024 decaps |
60641 cycles |
58193 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-1024 decaps |
60641 cycles |
58193 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
40221 cycles |
40252 cycles |
1.00 |
ML-KEM-512 encaps |
48434 cycles |
48424 cycles |
1.00 |
ML-KEM-512 decaps |
62625 cycles |
62618 cycles |
1.00 |
ML-KEM-768 keypair |
63911 cycles |
63706 cycles |
1.00 |
ML-KEM-768 encaps |
74924 cycles |
75085 cycles |
1.00 |
ML-KEM-768 decaps |
93591 cycles |
93596 cycles |
1.00 |
ML-KEM-1024 keypair |
95291 cycles |
95445 cycles |
1.00 |
ML-KEM-1024 encaps |
109405 cycles |
109672 cycles |
1.00 |
ML-KEM-1024 decaps |
132237 cycles |
132483 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
36599 cycles |
36609 cycles |
1.00 |
ML-KEM-512 encaps |
43086 cycles |
43073 cycles |
1.00 |
ML-KEM-512 decaps |
55714 cycles |
55711 cycles |
1.00 |
ML-KEM-768 keypair |
58634 cycles |
58682 cycles |
1.00 |
ML-KEM-768 encaps |
67519 cycles |
67521 cycles |
1.00 |
ML-KEM-768 decaps |
84506 cycles |
84450 cycles |
1.00 |
ML-KEM-1024 keypair |
88990 cycles |
88997 cycles |
1.00 |
ML-KEM-1024 encaps |
99191 cycles |
99192 cycles |
1.00 |
ML-KEM-1024 decaps |
120576 cycles |
120712 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
17674 cycles |
17644 cycles |
1.00 |
ML-KEM-512 encaps |
20600 cycles |
20601 cycles |
1.00 |
ML-KEM-512 decaps |
27088 cycles |
27077 cycles |
1.00 |
ML-KEM-768 keypair |
29919 cycles |
29899 cycles |
1.00 |
ML-KEM-768 encaps |
32728 cycles |
32770 cycles |
1.00 |
ML-KEM-768 decaps |
41992 cycles |
41964 cycles |
1.00 |
ML-KEM-1024 keypair |
43726 cycles |
43737 cycles |
1.00 |
ML-KEM-1024 encaps |
48783 cycles |
48724 cycles |
1.00 |
ML-KEM-1024 decaps |
61375 cycles |
61394 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
45730 cycles |
45760 cycles |
1.00 |
ML-KEM-512 encaps |
54307 cycles |
54386 cycles |
1.00 |
ML-KEM-512 decaps |
69783 cycles |
69833 cycles |
1.00 |
ML-KEM-768 keypair |
74158 cycles |
74214 cycles |
1.00 |
ML-KEM-768 encaps |
86001 cycles |
86131 cycles |
1.00 |
ML-KEM-768 decaps |
106669 cycles |
106676 cycles |
1.00 |
ML-KEM-1024 keypair |
112035 cycles |
112265 cycles |
1.00 |
ML-KEM-1024 encaps |
124570 cycles |
124792 cycles |
1.00 |
ML-KEM-1024 decaps |
150493 cycles |
150765 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
18657 cycles |
18641 cycles |
1.00 |
ML-KEM-512 encaps |
21879 cycles |
21876 cycles |
1.00 |
ML-KEM-512 decaps |
28879 cycles |
28864 cycles |
1.00 |
ML-KEM-768 keypair |
31590 cycles |
31540 cycles |
1.00 |
ML-KEM-768 encaps |
34744 cycles |
34772 cycles |
1.00 |
ML-KEM-768 decaps |
44829 cycles |
44778 cycles |
1.00 |
ML-KEM-1024 keypair |
46074 cycles |
46077 cycles |
1.00 |
ML-KEM-1024 encaps |
51509 cycles |
51491 cycles |
1.00 |
ML-KEM-1024 decaps |
65018 cycles |
65024 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
35450 cycles |
35412 cycles |
1.00 |
ML-KEM-512 encaps |
40089 cycles |
40110 cycles |
1.00 |
ML-KEM-512 decaps |
51097 cycles |
51135 cycles |
1.00 |
ML-KEM-768 keypair |
56740 cycles |
56668 cycles |
1.00 |
ML-KEM-768 encaps |
64546 cycles |
65154 cycles |
0.99 |
ML-KEM-768 decaps |
79370 cycles |
79301 cycles |
1.00 |
ML-KEM-1024 keypair |
87849 cycles |
87865 cycles |
1.00 |
ML-KEM-1024 encaps |
97111 cycles |
96879 cycles |
1.00 |
ML-KEM-1024 decaps |
115955 cycles |
115831 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
38932 cycles |
38888 cycles |
1.00 |
ML-KEM-512 encaps |
44527 cycles |
44594 cycles |
1.00 |
ML-KEM-512 decaps |
56593 cycles |
56672 cycles |
1.00 |
ML-KEM-768 keypair |
62331 cycles |
62287 cycles |
1.00 |
ML-KEM-768 encaps |
71055 cycles |
72308 cycles |
0.98 |
ML-KEM-768 decaps |
87343 cycles |
87694 cycles |
1.00 |
ML-KEM-1024 keypair |
96210 cycles |
96151 cycles |
1.00 |
ML-KEM-1024 encaps |
106366 cycles |
106136 cycles |
1.00 |
ML-KEM-1024 decaps |
126790 cycles |
126586 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
50755 cycles |
51363 cycles |
0.99 |
ML-KEM-512 encaps |
58520 cycles |
59331 cycles |
0.99 |
ML-KEM-512 decaps |
74875 cycles |
75552 cycles |
0.99 |
ML-KEM-768 keypair |
85665 cycles |
86501 cycles |
0.99 |
ML-KEM-768 encaps |
93646 cycles |
94937 cycles |
0.99 |
ML-KEM-768 decaps |
117947 cycles |
118639 cycles |
0.99 |
ML-KEM-1024 keypair |
130412 cycles |
131021 cycles |
1.00 |
ML-KEM-1024 encaps |
142041 cycles |
144288 cycles |
0.98 |
ML-KEM-1024 decaps |
174697 cycles |
174365 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
28265 cycles |
28269 cycles |
1.00 |
ML-KEM-512 encaps |
34166 cycles |
34121 cycles |
1.00 |
ML-KEM-512 decaps |
44400 cycles |
44377 cycles |
1.00 |
ML-KEM-768 keypair |
47653 cycles |
47670 cycles |
1.00 |
ML-KEM-768 encaps |
53998 cycles |
53906 cycles |
1.00 |
ML-KEM-768 decaps |
68424 cycles |
68360 cycles |
1.00 |
ML-KEM-1024 keypair |
70367 cycles |
70258 cycles |
1.00 |
ML-KEM-1024 encaps |
78755 cycles |
78748 cycles |
1.00 |
ML-KEM-1024 decaps |
98551 cycles |
98442 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
59139 cycles |
59143 cycles |
1.00 |
ML-KEM-512 encaps |
68601 cycles |
68640 cycles |
1.00 |
ML-KEM-512 decaps |
87321 cycles |
87348 cycles |
1.00 |
ML-KEM-768 keypair |
95328 cycles |
95292 cycles |
1.00 |
ML-KEM-768 encaps |
110308 cycles |
109839 cycles |
1.00 |
ML-KEM-768 decaps |
134544 cycles |
134315 cycles |
1.00 |
ML-KEM-1024 keypair |
148015 cycles |
147969 cycles |
1.00 |
ML-KEM-1024 encaps |
163853 cycles |
163829 cycles |
1.00 |
ML-KEM-1024 decaps |
195602 cycles |
195492 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
SpacemiT K1 8 (Banana Pi F3) benchmarks
Details
| Benchmark suite | Current: 3b7bde4 | Previous: 2bf8e59 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
155506 cycles |
155502 cycles |
1.00 |
ML-KEM-512 encaps |
163404 cycles |
163410 cycles |
1.00 |
ML-KEM-512 decaps |
206694 cycles |
206683 cycles |
1.00 |
ML-KEM-768 keypair |
249899 cycles |
249923 cycles |
1.00 |
ML-KEM-768 encaps |
270415 cycles |
270402 cycles |
1.00 |
ML-KEM-768 decaps |
332822 cycles |
332795 cycles |
1.00 |
ML-KEM-1024 keypair |
395733 cycles |
395710 cycles |
1.00 |
ML-KEM-1024 encaps |
422748 cycles |
422751 cycles |
1.00 |
ML-KEM-1024 decaps |
506154 cycles |
506349 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
…4_avx2