SSE2 implementation for LzFind_SaturSub() #140

yumeyao · 2025-07-24T07:30:09Z

x86[-64] doesn't have integer saturating arithmetic instructions (thus slow if not vectorized), since all x86-64 CPUs support SSE2, we can use SSE2 as a baseline implementation.

This implmentation is taken from clang's optimization result, and gcc/msvc can't optimize it this way, see here for a comparison on godbolt.

It also contains a minor fix to fix minimal gcc version to compile (without globally enabling SSE4.1/AVX2 but use the target GCC extension). I think the old value GCC 4.7.1 was there because AVX2 support is added in GCC 4.7, but starting from GCC 4.9, it is now possible to call x86 intrinsics from select functions in a file that are tagged with the corresponding target attribute without having to compile the entire file with the -mxxx option..

Technically GCC 4.7 and 4.8 don't have the target feature in x86 intrinsic headers and don't allow including per-instruction-extension-set header directly, code like below in <?mmintrin.h> is only available since GCC 4.9.

#ifndef __AVX2__
#pragma GCC push_options
#pragma GCC target("avx2")
#define __DISABLE_AVX2__
#endif /* __AVX2__ */

// The contents

#ifdef __DISABLE_AVX2__
#undef __DISABLE_AVX2__
#pragma GCC pop_options
#endif /* __DISABLE_AVX2__ */

ip7z · 2025-07-25T07:17:17Z

That LzFind_SaturSub code can be important only for 4 GB dictionary in LZMA .
We can use 4 GB dictionary, if we have more than 40 GB of RAM.
If we have 40 GB of RAM, then processor porobably supports SSE41.
So SSE2 branch is not too important.

yumeyao · 2025-07-25T08:13:01Z

So SSE2 branch is not too important.

true, I was not optimizing it for performance, actually I ran into the compile issue with an ancient compiler and looked into this file, then fixed it, and did the optimization as a favor.

Well, say, this could save ~10-ish bytes in binary code :)

SSE2 implementation for LzFind_SaturSub()

e7b7080

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SSE2 implementation for LzFind_SaturSub() #140

SSE2 implementation for LzFind_SaturSub() #140

Uh oh!

yumeyao commented Jul 24, 2025 •

edited

Loading

Uh oh!

ip7z commented Jul 25, 2025 •

edited

Loading

Uh oh!

yumeyao commented Jul 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SSE2 implementation for LzFind_SaturSub() #140

Are you sure you want to change the base?

SSE2 implementation for LzFind_SaturSub() #140

Uh oh!

Conversation

yumeyao commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ip7z commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yumeyao commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yumeyao commented Jul 24, 2025 •

edited

Loading

ip7z commented Jul 25, 2025 •

edited

Loading

yumeyao commented Jul 25, 2025 •

edited

Loading