x86_64: 32-byte align Keccak x4 AVX2 stack frame#1124
Conversation
For better performance, align stack to 32-byte in the AVX2 x4 Keccak backend implementation. Updates HOL-Light proofs accordingly. Unfortunately, no existing tactic for the extension from the 'core' to the 'subroutine' proofs can handle the pattern we use, so we fall back to some ad-hoc tactics. As and when s2n-bignum adds support for stack alignment to its automation, this can hopefully be removed. We also extend scripts/cfify to track the CFA register. A `mov rsp, %REG` re-anchors the CFA on REG, and subsequent modifications to the RSP do not require CFI directives. We handle this by conditioning the rules subq/addq->cfi_adjust_cfa_offset on the operand being the current CFA reg. Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
CBMC Results (ML-DSA-44, REDUCE-RAM)Full Results (199 proofs)
|
CBMC Results (ML-DSA-87, REDUCE-RAM)Full Results (199 proofs)
|
CBMC Results (ML-DSA-65, REDUCE-RAM)Full Results (199 proofs)
|
CBMC Results (ML-DSA-87)Full Results (199 proofs)
|
CBMC Results (ML-DSA-44)Full Results (199 proofs)
|
CBMC Results (ML-DSA-65)Full Results (199 proofs)
|
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (opt)
Details
| Benchmark suite | Current: 6c13542 | Previous: ec0cdd4 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46535 cycles |
46506 cycles |
1.00 |
ML-DSA-44 sign |
131058 cycles |
131078 cycles |
1.00 |
ML-DSA-44 verify |
47345 cycles |
47320 cycles |
1.00 |
ML-DSA-65 keypair |
81706 cycles |
81693 cycles |
1.00 |
ML-DSA-65 sign |
215431 cycles |
215418 cycles |
1.00 |
ML-DSA-65 verify |
79324 cycles |
79309 cycles |
1.00 |
ML-DSA-87 keypair |
132411 cycles |
132409 cycles |
1.00 |
ML-DSA-87 sign |
277428 cycles |
277534 cycles |
1.00 |
ML-DSA-87 verify |
134236 cycles |
134235 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (no-opt)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112746 cycles |
112740 cycles |
1.00 |
ML-DSA-44 sign |
400901 cycles |
400842 cycles |
1.00 |
ML-DSA-44 verify |
120128 cycles |
120086 cycles |
1.00 |
ML-DSA-65 keypair |
192883 cycles |
192877 cycles |
1.00 |
ML-DSA-65 sign |
649925 cycles |
649964 cycles |
1.00 |
ML-DSA-65 verify |
192956 cycles |
192956 cycles |
1 |
ML-DSA-87 keypair |
318782 cycles |
318775 cycles |
1.00 |
ML-DSA-87 sign |
828588 cycles |
828851 cycles |
1.00 |
ML-DSA-87 verify |
326650 cycles |
326654 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
43768 cycles |
45391 cycles |
0.96 |
ML-DSA-44 sign |
133272 cycles |
136216 cycles |
0.98 |
ML-DSA-44 verify |
45772 cycles |
47321 cycles |
0.97 |
ML-DSA-65 keypair |
76387 cycles |
78853 cycles |
0.97 |
ML-DSA-65 sign |
218619 cycles |
223148 cycles |
0.98 |
ML-DSA-65 verify |
76486 cycles |
77818 cycles |
0.98 |
ML-DSA-87 keypair |
124993 cycles |
126383 cycles |
0.99 |
ML-DSA-87 sign |
277386 cycles |
280003 cycles |
0.99 |
ML-DSA-87 verify |
122348 cycles |
124086 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
94221 cycles |
94250 cycles |
1.00 |
ML-DSA-44 sign |
329262 cycles |
329671 cycles |
1.00 |
ML-DSA-44 verify |
98725 cycles |
98848 cycles |
1.00 |
ML-DSA-65 keypair |
161876 cycles |
161842 cycles |
1.00 |
ML-DSA-65 sign |
539009 cycles |
538466 cycles |
1.00 |
ML-DSA-65 verify |
160676 cycles |
160405 cycles |
1.00 |
ML-DSA-87 keypair |
264143 cycles |
264153 cycles |
1.00 |
ML-DSA-87 sign |
694078 cycles |
694626 cycles |
1.00 |
ML-DSA-87 verify |
265819 cycles |
265814 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
820432 cycles |
820764 cycles |
1.00 |
ML-DSA-44 sign |
3222145 cycles |
3222715 cycles |
1.00 |
ML-DSA-44 verify |
917121 cycles |
917496 cycles |
1.00 |
ML-DSA-65 keypair |
1391448 cycles |
1391037 cycles |
1.00 |
ML-DSA-65 sign |
5243597 cycles |
5232579 cycles |
1.00 |
ML-DSA-65 verify |
1466874 cycles |
1464573 cycles |
1.00 |
ML-DSA-87 keypair |
2298784 cycles |
2299791 cycles |
1.00 |
ML-DSA-87 sign |
6610495 cycles |
6616416 cycles |
1.00 |
ML-DSA-87 verify |
2406162 cycles |
2407836 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
55563 cycles |
58246 cycles |
0.95 |
ML-DSA-44 sign |
165532 cycles |
168182 cycles |
0.98 |
ML-DSA-44 verify |
58065 cycles |
58487 cycles |
0.99 |
ML-DSA-65 keypair |
96320 cycles |
96874 cycles |
0.99 |
ML-DSA-65 sign |
267696 cycles |
271854 cycles |
0.98 |
ML-DSA-65 verify |
96597 cycles |
97497 cycles |
0.99 |
ML-DSA-87 keypair |
155692 cycles |
164997 cycles |
0.94 |
ML-DSA-87 sign |
328179 cycles |
339783 cycles |
0.97 |
ML-DSA-87 verify |
151858 cycles |
154373 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112606 cycles |
112584 cycles |
1.00 |
ML-DSA-44 sign |
355233 cycles |
355058 cycles |
1.00 |
ML-DSA-44 verify |
117577 cycles |
117447 cycles |
1.00 |
ML-DSA-65 keypair |
194491 cycles |
194602 cycles |
1.00 |
ML-DSA-65 sign |
585379 cycles |
585206 cycles |
1.00 |
ML-DSA-65 verify |
193298 cycles |
193231 cycles |
1.00 |
ML-DSA-87 keypair |
321166 cycles |
321492 cycles |
1.00 |
ML-DSA-87 sign |
749549 cycles |
750302 cycles |
1.00 |
ML-DSA-87 verify |
318315 cycles |
318520 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
133850 cycles |
133824 cycles |
1.00 |
ML-DSA-44 sign |
522666 cycles |
523429 cycles |
1.00 |
ML-DSA-44 verify |
146999 cycles |
147203 cycles |
1.00 |
ML-DSA-65 keypair |
223918 cycles |
224049 cycles |
1.00 |
ML-DSA-65 sign |
853033 cycles |
850408 cycles |
1.00 |
ML-DSA-65 verify |
233445 cycles |
233150 cycles |
1.00 |
ML-DSA-87 keypair |
371837 cycles |
373095 cycles |
1.00 |
ML-DSA-87 sign |
1073791 cycles |
1075287 cycles |
1.00 |
ML-DSA-87 verify |
384334 cycles |
385379 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46880 cycles |
47240 cycles |
0.99 |
ML-DSA-44 sign |
144149 cycles |
146136 cycles |
0.99 |
ML-DSA-44 verify |
49896 cycles |
50648 cycles |
0.99 |
ML-DSA-65 keypair |
82602 cycles |
83469 cycles |
0.99 |
ML-DSA-65 sign |
229918 cycles |
230072 cycles |
1.00 |
ML-DSA-65 verify |
83172 cycles |
83481 cycles |
1.00 |
ML-DSA-87 keypair |
130909 cycles |
132078 cycles |
0.99 |
ML-DSA-87 sign |
280817 cycles |
283164 cycles |
0.99 |
ML-DSA-87 verify |
128828 cycles |
130195 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
67437 cycles |
67230 cycles |
1.00 |
ML-DSA-44 sign |
201387 cycles |
201509 cycles |
1.00 |
ML-DSA-44 verify |
70315 cycles |
70229 cycles |
1.00 |
ML-DSA-65 keypair |
119444 cycles |
119440 cycles |
1.00 |
ML-DSA-65 sign |
328182 cycles |
328213 cycles |
1.00 |
ML-DSA-65 verify |
116781 cycles |
116941 cycles |
1.00 |
ML-DSA-87 keypair |
196729 cycles |
196651 cycles |
1.00 |
ML-DSA-87 sign |
425010 cycles |
424774 cycles |
1.00 |
ML-DSA-87 verify |
193266 cycles |
193028 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
61922 cycles |
62711 cycles |
0.99 |
ML-DSA-44 sign |
191461 cycles |
193467 cycles |
0.99 |
ML-DSA-44 verify |
66255 cycles |
67225 cycles |
0.99 |
ML-DSA-65 keypair |
108162 cycles |
112590 cycles |
0.96 |
ML-DSA-65 sign |
314656 cycles |
319856 cycles |
0.98 |
ML-DSA-65 verify |
109273 cycles |
112272 cycles |
0.97 |
ML-DSA-87 keypair |
172432 cycles |
172667 cycles |
1.00 |
ML-DSA-87 sign |
383790 cycles |
385139 cycles |
1.00 |
ML-DSA-87 verify |
172188 cycles |
172209 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
118781 cycles |
119396 cycles |
0.99 |
ML-DSA-44 sign |
448947 cycles |
446583 cycles |
1.01 |
ML-DSA-44 verify |
128855 cycles |
128844 cycles |
1.00 |
ML-DSA-65 keypair |
201858 cycles |
202970 cycles |
0.99 |
ML-DSA-65 sign |
719584 cycles |
719324 cycles |
1.00 |
ML-DSA-65 verify |
208524 cycles |
207674 cycles |
1.00 |
ML-DSA-87 keypair |
333642 cycles |
335008 cycles |
1.00 |
ML-DSA-87 sign |
914941 cycles |
918169 cycles |
1.00 |
ML-DSA-87 verify |
341970 cycles |
342415 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-87 verify |
178158 cycles |
172209 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
213751 cycles |
212528 cycles |
1.01 |
ML-DSA-44 sign |
758285 cycles |
758331 cycles |
1.00 |
ML-DSA-44 verify |
230133 cycles |
229988 cycles |
1.00 |
ML-DSA-65 keypair |
378479 cycles |
378899 cycles |
1.00 |
ML-DSA-65 sign |
1241045 cycles |
1241831 cycles |
1.00 |
ML-DSA-65 verify |
372957 cycles |
372984 cycles |
1.00 |
ML-DSA-87 keypair |
604331 cycles |
603666 cycles |
1.00 |
ML-DSA-87 sign |
1582990 cycles |
1581558 cycles |
1.00 |
ML-DSA-87 verify |
618690 cycles |
618334 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
128404 cycles |
128472 cycles |
1.00 |
ML-DSA-44 sign |
445324 cycles |
444913 cycles |
1.00 |
ML-DSA-44 verify |
136666 cycles |
136570 cycles |
1.00 |
ML-DSA-65 keypair |
220459 cycles |
220085 cycles |
1.00 |
ML-DSA-65 sign |
718170 cycles |
718725 cycles |
1.00 |
ML-DSA-65 verify |
220838 cycles |
221154 cycles |
1.00 |
ML-DSA-87 keypair |
365445 cycles |
365468 cycles |
1.00 |
ML-DSA-87 sign |
918949 cycles |
917811 cycles |
1.00 |
ML-DSA-87 verify |
371017 cycles |
371454 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
150916 cycles |
153976 cycles |
0.98 |
ML-DSA-44 sign |
545488 cycles |
559057 cycles |
0.98 |
ML-DSA-44 verify |
163644 cycles |
166793 cycles |
0.98 |
ML-DSA-65 keypair |
255567 cycles |
256142 cycles |
1.00 |
ML-DSA-65 sign |
887481 cycles |
889069 cycles |
1.00 |
ML-DSA-65 verify |
262294 cycles |
263111 cycles |
1.00 |
ML-DSA-87 keypair |
425948 cycles |
426566 cycles |
1.00 |
ML-DSA-87 sign |
1144091 cycles |
1150462 cycles |
0.99 |
ML-DSA-87 verify |
440785 cycles |
439448 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112466 cycles |
112471 cycles |
1.00 |
ML-DSA-44 sign |
354625 cycles |
354319 cycles |
1.00 |
ML-DSA-44 verify |
117059 cycles |
117101 cycles |
1.00 |
ML-DSA-65 keypair |
194532 cycles |
194670 cycles |
1.00 |
ML-DSA-65 sign |
584264 cycles |
584352 cycles |
1.00 |
ML-DSA-65 verify |
193237 cycles |
193010 cycles |
1.00 |
ML-DSA-87 keypair |
320621 cycles |
321273 cycles |
1.00 |
ML-DSA-87 sign |
748906 cycles |
749948 cycles |
1.00 |
ML-DSA-87 verify |
317880 cycles |
318693 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
71572 cycles |
71489 cycles |
1.00 |
ML-DSA-44 sign |
211568 cycles |
211342 cycles |
1.00 |
ML-DSA-44 verify |
74829 cycles |
74924 cycles |
1.00 |
ML-DSA-65 keypair |
125984 cycles |
125905 cycles |
1.00 |
ML-DSA-65 sign |
347604 cycles |
347998 cycles |
1.00 |
ML-DSA-65 verify |
123873 cycles |
124044 cycles |
1.00 |
ML-DSA-87 keypair |
206240 cycles |
206671 cycles |
1.00 |
ML-DSA-87 sign |
443131 cycles |
447427 cycles |
0.99 |
ML-DSA-87 verify |
204465 cycles |
204120 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
137889 cycles |
137948 cycles |
1.00 |
ML-DSA-44 sign |
482110 cycles |
481676 cycles |
1.00 |
ML-DSA-44 verify |
148803 cycles |
148659 cycles |
1.00 |
ML-DSA-65 keypair |
241031 cycles |
240730 cycles |
1.00 |
ML-DSA-65 sign |
785016 cycles |
784965 cycles |
1.00 |
ML-DSA-65 verify |
240613 cycles |
241049 cycles |
1.00 |
ML-DSA-87 keypair |
395048 cycles |
395084 cycles |
1.00 |
ML-DSA-87 sign |
1005956 cycles |
1004845 cycles |
1.00 |
ML-DSA-87 verify |
402645 cycles |
403184 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
212371 cycles |
212425 cycles |
1.00 |
ML-DSA-44 sign |
756804 cycles |
756359 cycles |
1.00 |
ML-DSA-44 verify |
229234 cycles |
229083 cycles |
1.00 |
ML-DSA-65 keypair |
378769 cycles |
378500 cycles |
1.00 |
ML-DSA-65 sign |
1240394 cycles |
1240209 cycles |
1.00 |
ML-DSA-65 verify |
371908 cycles |
371886 cycles |
1.00 |
ML-DSA-87 keypair |
602950 cycles |
602448 cycles |
1.00 |
ML-DSA-87 sign |
1580764 cycles |
1580047 cycles |
1.00 |
ML-DSA-87 verify |
619634 cycles |
618603 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
271362 cycles |
266927 cycles |
1.02 |
ML-DSA-44 sign |
803267 cycles |
797012 cycles |
1.01 |
ML-DSA-44 verify |
272085 cycles |
268934 cycles |
1.01 |
ML-DSA-65 keypair |
465548 cycles |
462006 cycles |
1.01 |
ML-DSA-65 sign |
1346356 cycles |
1325140 cycles |
1.02 |
ML-DSA-65 verify |
452908 cycles |
447051 cycles |
1.01 |
ML-DSA-87 keypair |
796067 cycles |
789145 cycles |
1.01 |
ML-DSA-87 sign |
1814243 cycles |
1808930 cycles |
1.00 |
ML-DSA-87 verify |
774654 cycles |
768055 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
458869 cycles |
456127 cycles |
1.01 |
ML-DSA-44 sign |
2126957 cycles |
2116401 cycles |
1.00 |
ML-DSA-44 verify |
552019 cycles |
548507 cycles |
1.01 |
ML-DSA-65 keypair |
770502 cycles |
766953 cycles |
1.00 |
ML-DSA-65 sign |
3470118 cycles |
3454812 cycles |
1.00 |
ML-DSA-65 verify |
855533 cycles |
851822 cycles |
1.00 |
ML-DSA-87 keypair |
1257950 cycles |
1239858 cycles |
1.01 |
ML-DSA-87 sign |
4350714 cycles |
4301107 cycles |
1.01 |
ML-DSA-87 verify |
1376864 cycles |
1364841 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
220320 cycles |
225607 cycles |
0.98 |
ML-DSA-44 sign |
629787 cycles |
613159 cycles |
1.03 |
ML-DSA-44 verify |
218806 cycles |
221594 cycles |
0.99 |
ML-DSA-65 keypair |
381297 cycles |
390520 cycles |
0.98 |
ML-DSA-65 sign |
981110 cycles |
1011825 cycles |
0.97 |
ML-DSA-65 verify |
362378 cycles |
366404 cycles |
0.99 |
ML-DSA-87 keypair |
643194 cycles |
647780 cycles |
0.99 |
ML-DSA-87 sign |
1333643 cycles |
1330441 cycles |
1.00 |
ML-DSA-87 verify |
622584 cycles |
623332 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-87 keypair |
672194 cycles |
647780 cycles |
1.04 |
ML-DSA-87 sign |
1388591 cycles |
1330441 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Details
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
311539 cycles |
313248 cycles |
0.99 |
ML-DSA-44 sign |
1218975 cycles |
1165859 cycles |
1.05 |
ML-DSA-44 verify |
357865 cycles |
340574 cycles |
1.05 |
ML-DSA-65 keypair |
577066 cycles |
577106 cycles |
1.00 |
ML-DSA-65 sign |
1946955 cycles |
1997418 cycles |
0.97 |
ML-DSA-65 verify |
545442 cycles |
555863 cycles |
0.98 |
ML-DSA-87 keypair |
865804 cycles |
881765 cycles |
0.98 |
ML-DSA-87 sign |
2425492 cycles |
2508825 cycles |
0.97 |
ML-DSA-87 verify |
893414 cycles |
921818 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
mkannwischer
left a comment
There was a problem hiding this comment.
Thanks @hanno-becker. LGTM.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 6c13542 | Previous: cbc80f6 | Ratio |
|---|---|---|---|
ML-DSA-44 sign |
1218975 cycles |
1165859 cycles |
1.05 |
ML-DSA-44 verify |
357865 cycles |
340574 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
For better performance, align stack to 32-byte in the AVX2 x4 Keccak backend implementation.
Updates HOL-Light proofs accordingly. Unfortunately, no existing tactic for the extension from the 'core' to the 'subroutine' proofs can handle the pattern we use, so we fall back to some ad-hoc tactics. As and when s2n-bignum adds support for stack alignment to its automation, this can hopefully be removed.
We also extend scripts/cfify to track the CFA register. A
mov rsp, %REGre-anchors the CFA on REG, and subsequent modifications to the RSP do not require CFI directives. We handle this by conditioning the rules subq/addq->cfi_adjust_cfa_offset on the operand being the current CFA reg.