-
Notifications
You must be signed in to change notification settings - Fork 309
Use LLVM intrinsics for madd intrinsics
#1985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can you add something like this /// This is a trick used in the adler32 algorithm to get a widening addition. The
/// multiplication by 1 is trivial, but must not be optimized out because then the vpmaddwd
/// instruction is no longer selected.
#[target_feature(enable = "avx2")]
#[cfg_attr(test, assert_instr(vpmaddwd))]
unsafe fn _mm256_madd_epi16_mul_one(mad: __m256i) -> __m256i {
let one_v = _mm256_set1_epi16(1);
_mm256_madd_epi16(mad, one_v)
}So that we remember in the future why we can't just use that generic implementation? |
|
I have added the following comments: // It's a trick used in the Adler-32 algorithm to perform a widening addition.
//
// ```rust
// #[target_feature(enable = "sse2")]
// unsafe fn widening_add(mad: __m128i) -> __m128i {
// _mm_madd_epi16(mad, _mm_set1_epi16(1))
// }
// ```
//
// If we implement this using generic vector intrinsics, the optimizer
// will eliminate this pattern, and `pmaddwd` will no longer be emitted.
// For this reason, we use x86 intrinsics. |
| #[rustc_const_unstable(feature = "stdarch_const_x86", issue = "149298")] | ||
| pub const fn _mm_madd_epi16(a: __m128i, b: __m128i) -> __m128i { | ||
| unsafe { | ||
| let r: i32x8 = simd_mul(simd_cast(a.as_i16x8()), simd_cast(b.as_i16x8())); | ||
| let even: i32x4 = simd_shuffle!(r, r, [0, 2, 4, 6]); | ||
| let odd: i32x4 = simd_shuffle!(r, r, [1, 3, 5, 7]); | ||
| simd_add(even, odd).as_m128i() | ||
| } | ||
| pub fn _mm_madd_epi16(a: __m128i, b: __m128i) -> __m128i { | ||
| // It's a trick used in the Adler-32 algorithm to perform a widening addition. | ||
| // | ||
| // ```rust | ||
| // #[target_feature(enable = "sse2")] | ||
| // unsafe fn widening_add(mad: __m128i) -> __m128i { | ||
| // _mm_madd_epi16(mad, _mm_set1_epi16(1)) | ||
| // } | ||
| // ``` | ||
| // | ||
| // If we implement this using generic vector intrinsics, the optimizer | ||
| // will eliminate this pattern, and `pmaddwd` will no longer be emitted. | ||
| // For this reason, we use x86 intrinsics. | ||
| unsafe { transmute(pmaddwd(a.as_i16x8(), b.as_i16x8())) } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could consider using https://doc.rust-lang.org/std/intrinsics/fn.const_eval_select.html so we don't loose all of the const stuff. Up to @sayantn though, I don't have full context on what we'd like to be const fn currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[target_feature] functions do not implement the Fn traits, while const_eval_select restricts FnOnce. So this does not seem feasible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still do it, the inner function which will be invoked doesn't need to have the correct target features if marked #[inline] (#[inline(always)] will probably be better) Godbolt. But I don't think it is that important to make this const right now. I think a better approach will be fixing the LLVM bug and make this truly const in future, but this should be enough for the patch
| #[rustc_const_unstable(feature = "stdarch_const_x86", issue = "149298")] | ||
| pub const fn _mm_madd_epi16(a: __m128i, b: __m128i) -> __m128i { | ||
| unsafe { | ||
| let r: i32x8 = simd_mul(simd_cast(a.as_i16x8()), simd_cast(b.as_i16x8())); | ||
| let even: i32x4 = simd_shuffle!(r, r, [0, 2, 4, 6]); | ||
| let odd: i32x4 = simd_shuffle!(r, r, [1, 3, 5, 7]); | ||
| simd_add(even, odd).as_m128i() | ||
| } | ||
| pub fn _mm_madd_epi16(a: __m128i, b: __m128i) -> __m128i { | ||
| // It's a trick used in the Adler-32 algorithm to perform a widening addition. | ||
| // | ||
| // ```rust | ||
| // #[target_feature(enable = "sse2")] | ||
| // unsafe fn widening_add(mad: __m128i) -> __m128i { | ||
| // _mm_madd_epi16(mad, _mm_set1_epi16(1)) | ||
| // } | ||
| // ``` | ||
| // | ||
| // If we implement this using generic vector intrinsics, the optimizer | ||
| // will eliminate this pattern, and `pmaddwd` will no longer be emitted. | ||
| // For this reason, we use x86 intrinsics. | ||
| unsafe { transmute(pmaddwd(a.as_i16x8(), b.as_i16x8())) } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still do it, the inner function which will be invoked doesn't need to have the correct target features if marked #[inline] (#[inline(always)] will probably be better) Godbolt. But I don't think it is that important to make this const right now. I think a better approach will be fixing the LLVM bug and make this truly const in future, but this should be enough for the patch
|
The Miri test suite might start failing now, since the implementation of these intrinsics were removed. cc @RalfJung |
rust-lang/rust#150560
r? @sayantn