Conversation
CodSpeed WallTime Performance ReportMerging #147 will not alter performanceComparing
|
|
@xrvdg You spend some time optimizing Skyscraper. Do you have any ideas we could try here? |
xrvdg
left a comment
There was a problem hiding this comment.
In the other skyscraper implementation we tuned the output range of the montgomery multiplication (modmul) to only reduce once the addition is done. Looks like we can do the opposite here: modmul does a full reduction and that leaves space within the u256 for the additions.
| r = rc_a + addmod(mulmod(mulmod(l, l, P), SIGMA_INV, P), r, P); | ||
| l = rc_b + addmod(mulmod(mulmod(r, r, P), SIGMA_INV, P), l, P); |
There was a problem hiding this comment.
I think these addmod are safe to be replaced by regular adds. After mulmod there is 4.29p space left before a u256 overflows. Afterwards it directly gets fed into the mulmod in sss_reduce_l which in turn reduces it.
| l = addmod( | ||
| rc_b, | ||
| addmod(mulmod(mulmod(r, r, P), SIGMA_INV, P), l, P), | ||
| P | ||
| ); |
There was a problem hiding this comment.
Same here, only the outer addmod should be required.
| unchecked { | ||
| r = rc_a + addmod(mulmod(mulmod(l, l, P), SIGMA_INV, P), r, P); | ||
| } |
There was a problem hiding this comment.
Based on this line it looks like r doesn't have to be reduced for bb so it should be fine to drop the addmod here. Same reasoning as above.
@philsippl asked how Skyscraper would perform onchain. Turns out it's about 10x better than Posseidon. This is useful for Merkle trees that are efficient to handle on-chain and in circuit.