Use LLVM intrinsics for `madd` intrinsics #1985

usamoi · 2026-01-01T12:05:15Z

folkertdev · 2026-01-01T14:56:32Z

Can you add something like this

    /// This is a trick used in the adler32 algorithm to get a widening addition. The
    /// multiplication by 1 is trivial, but must not be optimized out because then the vpmaddwd
    /// instruction is no longer selected.
    #[target_feature(enable = "avx2")]
    #[cfg_attr(test, assert_instr(vpmaddwd))]
    unsafe fn _mm256_madd_epi16_mul_one(mad: __m256i) -> __m256i {
        let one_v = _mm256_set1_epi16(1);
        _mm256_madd_epi16(mad, one_v)
    }

So that we remember in the future why we can't just use that generic implementation?

usamoi · 2026-01-01T16:05:09Z

I have added the following comments:

// It's a trick used in the Adler-32 algorithm to perform a widening addition.
//
// ```rust
// #[target_feature(enable = "sse2")]
// unsafe fn widening_add(mad: __m128i) -> __m128i {
//     _mm_madd_epi16(mad, _mm_set1_epi16(1))
// }
// ```
//
// If we implement this using generic vector intrinsics, the optimizer
// will eliminate this pattern, and `pmaddwd` will no longer be emitted.
// For this reason, we use x86 intrinsics.

folkertdev · 2026-01-01T16:18:39Z

crates/core_arch/src/x86/sse2.rs

-#[rustc_const_unstable(feature = "stdarch_const_x86", issue = "149298")]
-pub const fn _mm_madd_epi16(a: __m128i, b: __m128i) -> __m128i {
-    unsafe {
-        let r: i32x8 = simd_mul(simd_cast(a.as_i16x8()), simd_cast(b.as_i16x8()));
-        let even: i32x4 = simd_shuffle!(r, r, [0, 2, 4, 6]);
-        let odd: i32x4 = simd_shuffle!(r, r, [1, 3, 5, 7]);
-        simd_add(even, odd).as_m128i()
-    }
+pub fn _mm_madd_epi16(a: __m128i, b: __m128i) -> __m128i {
+    // It's a trick used in the Adler-32 algorithm to perform a widening addition.
+    //
+    // ```rust
+    // #[target_feature(enable = "sse2")]
+    // unsafe fn widening_add(mad: __m128i) -> __m128i {
+    //     _mm_madd_epi16(mad, _mm_set1_epi16(1))
+    // }
+    // ```
+    //
+    // If we implement this using generic vector intrinsics, the optimizer
+    // will eliminate this pattern, and `pmaddwd` will no longer be emitted.
+    // For this reason, we use x86 intrinsics.
+    unsafe { transmute(pmaddwd(a.as_i16x8(), b.as_i16x8())) }
 }


we could consider using https://doc.rust-lang.org/std/intrinsics/fn.const_eval_select.html so we don't loose all of the const stuff. Up to @sayantn though, I don't have full context on what we'd like to be const fn currently.

#[target_feature] functions do not implement the Fn traits, while const_eval_select restricts FnOnce. So this does not seem feasible.

We can still do it, the inner function which will be invoked doesn't need to have the correct target features if marked #[inline] (#[inline(always)] will probably be better) Godbolt. But I don't think it is that important to make this const right now. I think a better approach will be fixing the LLVM bug and make this truly const in future, but this should be enough for the patch

sayantn · 2026-01-02T19:32:04Z

crates/core_arch/src/x86/sse2.rs

-#[rustc_const_unstable(feature = "stdarch_const_x86", issue = "149298")]
-pub const fn _mm_madd_epi16(a: __m128i, b: __m128i) -> __m128i {
-    unsafe {
-        let r: i32x8 = simd_mul(simd_cast(a.as_i16x8()), simd_cast(b.as_i16x8()));
-        let even: i32x4 = simd_shuffle!(r, r, [0, 2, 4, 6]);
-        let odd: i32x4 = simd_shuffle!(r, r, [1, 3, 5, 7]);
-        simd_add(even, odd).as_m128i()
-    }
+pub fn _mm_madd_epi16(a: __m128i, b: __m128i) -> __m128i {
+    // It's a trick used in the Adler-32 algorithm to perform a widening addition.
+    //
+    // ```rust
+    // #[target_feature(enable = "sse2")]
+    // unsafe fn widening_add(mad: __m128i) -> __m128i {
+    //     _mm_madd_epi16(mad, _mm_set1_epi16(1))
+    // }
+    // ```
+    //
+    // If we implement this using generic vector intrinsics, the optimizer
+    // will eliminate this pattern, and `pmaddwd` will no longer be emitted.
+    // For this reason, we use x86 intrinsics.
+    unsafe { transmute(pmaddwd(a.as_i16x8(), b.as_i16x8())) }
 }


We can still do it, the inner function which will be invoked doesn't need to have the correct target features if marked #[inline] (#[inline(always)] will probably be better) Godbolt. But I don't think it is that important to make this const right now. I think a better approach will be fixing the LLVM bug and make this truly const in future, but this should be enough for the patch

sayantn · 2026-01-02T19:44:11Z

The Miri test suite might start failing now, since the implementation of these intrinsics were removed. cc @RalfJung

rustbot assigned sayantn Jan 1, 2026

Use LLVM intrinsics for madd intrinsics

3214671

usamoi force-pushed the vpmaddwd branch from be3c6e7 to 3214671 Compare January 1, 2026 16:02

folkertdev mentioned this pull request Jan 1, 2026

[X86] use VPMADDWD for widening adjacent addition llvm/llvm-project#174149

Open

folkertdev reviewed Jan 1, 2026

View reviewed changes

sayantn approved these changes Jan 2, 2026

View reviewed changes

sayantn added this pull request to the merge queue Jan 2, 2026

Merged via the queue into rust-lang:main with commit 7de6f04 Jan 2, 2026
75 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use LLVM intrinsics for `madd` intrinsics #1985

Use LLVM intrinsics for `madd` intrinsics #1985

usamoi commented Jan 1, 2026

Uh oh!

folkertdev commented Jan 1, 2026

Uh oh!

usamoi commented Jan 1, 2026

Uh oh!

folkertdev Jan 1, 2026

Uh oh!

usamoi Jan 1, 2026

Uh oh!

sayantn Jan 2, 2026

Uh oh!

sayantn Jan 2, 2026

Uh oh!

sayantn commented Jan 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use LLVM intrinsics for madd intrinsics #1985

Use LLVM intrinsics for madd intrinsics #1985

Conversation

usamoi commented Jan 1, 2026

Uh oh!

folkertdev commented Jan 1, 2026

Uh oh!

usamoi commented Jan 1, 2026

Uh oh!

folkertdev Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

usamoi Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

sayantn Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

sayantn Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

sayantn commented Jan 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use LLVM intrinsics for `madd` intrinsics #1985

Use LLVM intrinsics for `madd` intrinsics #1985