-
-
Notifications
You must be signed in to change notification settings - Fork 33.9k
Description
Feature or enhancement
Proposal:
We consolidated much of our bytes -> hexadecimal string code into one place as Python/pystrhex.c a while back. It is still written using a traditional scalar iterate over bytes and convert their nibbles logic as i snatural. Now that it's all in one place, we can do better.
x86_64 and arm64 (aarch64) are both guaranteed to have SSE2 and NEON respectively which can handle processing 16 bytes at once. Modern compilers, starting with clang eons ago, and more recently with gcc 12 (2022) abstract the operation we need to do into a nice function so we do not even need to directly write the CPU specific code for this use case. Maintainability win!
Will it be worthwhile? It turns out the answer is yes (see PR). It is minor on something as lowly as a baseline minimum md5.hexdigest() (16-bytes) but is clear on larger data such as sha256.hexdigest() and sha512.hexdigest(). At those sizes it is common to see 1.5-3x faster hex conversion. Realistically I doubt many applications are doing conversions from binary data into hex larger than those quite common practical use cases, but it can be >10x faster if so (as measured at 4K).
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere