Skip to content

Conversation

@yumeyao
Copy link

@yumeyao yumeyao commented Jul 24, 2025

x86[-64] doesn't have integer saturating arithmetic instructions (thus slow if not vectorized), since all x86-64 CPUs support SSE2, we can use SSE2 as a baseline implementation.

This implmentation is taken from clang's optimization result, and gcc/msvc can't optimize it this way, see here for a comparison on godbolt.

It also contains a minor fix to fix minimal gcc version to compile (without globally enabling SSE4.1/AVX2 but use the target GCC extension). I think the old value GCC 4.7.1 was there because AVX2 support is added in GCC 4.7, but starting from GCC 4.9, it is now possible to call x86 intrinsics from select functions in a file that are tagged with the corresponding target attribute without having to compile the entire file with the -mxxx option..

Technically GCC 4.7 and 4.8 don't have the target feature in x86 intrinsic headers and don't allow including per-instruction-extension-set header directly, code like below in <?mmintrin.h> is only available since GCC 4.9.

#ifndef __AVX2__
#pragma GCC push_options
#pragma GCC target("avx2")
#define __DISABLE_AVX2__
#endif /* __AVX2__ */

// The contents

#ifdef __DISABLE_AVX2__
#undef __DISABLE_AVX2__
#pragma GCC pop_options
#endif /* __DISABLE_AVX2__ */

@ip7z
Copy link
Owner

ip7z commented Jul 25, 2025

That LzFind_SaturSub code can be important only for 4 GB dictionary in LZMA .
We can use 4 GB dictionary, if we have more than 40 GB of RAM.
If we have 40 GB of RAM, then processor porobably supports SSE41.
So SSE2 branch is not too important.

@yumeyao
Copy link
Author

yumeyao commented Jul 25, 2025

So SSE2 branch is not too important.

true, I was not optimizing it for performance, actually I ran into the compile issue with an ancient compiler and looked into this file, then fixed it, and did the optimization as a favor.

Well, say, this could save ~10-ish bytes in binary code :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants