Add secp256k1 AVX2/AVX-512 field multiplication proof of concept#32
Closed
Add secp256k1 AVX2/AVX-512 field multiplication proof of concept#32
Conversation
This PoC demonstrates 4-way parallel secp256k1 field multiplication using AVX2 and 8-way parallel using AVX-512 IFMA instructions. Results on AVX2: - Scalar (4x sequential): 43.20 M mul/sec - AVX2 (4-way parallel): 105.83 M mul/sec - Speedup: 2.45x Includes GitHub Actions workflow to benchmark AVX-512 IFMA on cloud runners.
- Add field_ops_avx2.h: 4-way parallel field add/sub/neg - Add group.h: Jacobian point structures and generator G - Add group_avx2.h: 4-way parallel point doubling and addition - Add bench_point.c: Point addition benchmark Local results (Apple Silicon via Rosetta): - Scalar: 16.83 M additions/sec - AVX2: 11.60 M additions/sec (0.69x - needs optimization) The AVX2 point addition is currently slower due to: 1. Simplified field multiplication (ignores carries) 2. Memory layout overhead for limb-slicing 3. Need for proper 128-bit intermediate handling Next steps: Optimize field multiplication with proper carry propagation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
secp256k1 AVX2/AVX-512 Proof of Concept
This PR explored using AVX2 SIMD instructions to accelerate secp256k1 field multiplication for EOA address mining.
Benchmark Results
Field Multiplication (GitHub Actions - AMD EPYC 7763)
Point Addition (Local - Apple Silicon via Rosetta)
Key Findings
AVX2 field multiplication shows excellent speedup (3.66x on AMD EPYC)
AVX2 point addition is slower than scalar in this PoC due to:
AVX-512 IFMA not available on GitHub Actions runners (AMD EPYC doesn't have it)
Technical Approach
vpmuludqcompatibilityFiles Created
Conclusion
While AVX2 shows promising speedup for individual field operations, achieving end-to-end speedup for point operations requires:
For EOA mining, CUDA/GPU implementation is likely more practical since GPUs can run thousands of parallel point operations vs 4-8 with SIMD.
This PR is closed as an exploratory PoC. The code remains in the branch for reference.