Speed up bytes.hex() and related pystrhex.c users using SIMD

# Feature or enhancement

### Proposal:

We consolidated much of our bytes -> hexadecimal string code into one place as Python/pystrhex.c a while back. It is still written using a traditional scalar iterate over bytes and convert their nibbles logic as i snatural. Now that it's all in one place, we can do better.

x86_64 and arm64 (aarch64) are both guaranteed to have SSE2 and NEON respectively which can handle processing 16 bytes at once. Modern compilers, starting with clang eons ago, and more recently with gcc 12 (2022) abstract the operation we need to do into a nice function so we do not even need to directly write the CPU specific code for this use case. Maintainability win!

Will it be worthwhile?  It turns out the answer is yes (see PR).  It is minor on something as lowly as a baseline minimum md5.hexdigest() (16-bytes) but is clear on larger data such as sha256.hexdigest() and sha512.hexdigest().  At those sizes it is common to see 1.5-3x faster hex conversion.  Realistically I doubt many applications are doing conversions from binary data into hex larger than those quite common practical use cases, but it can be >10x faster if so (as measured at 4K).

### Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere


### Linked PRs
* gh-143991

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Speed up bytes.hex() and related pystrhex.c users using SIMD #144015

Feature or enhancement

Proposal:

Has this already been discussed elsewhere?

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Speed up bytes.hex() and related pystrhex.c users using SIMD #144015

Description

Feature or enhancement

Proposal:

Has this already been discussed elsewhere?

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions