Skip to content

Fix horizontal fusion missed due to duplicate inputs not being eliminated#2445

Merged
athas merged 3 commits intomasterfrom
copilot/fix-horizontal-fusion-issue
May 5, 2026
Merged

Fix horizontal fusion missed due to duplicate inputs not being eliminated#2445
athas merged 3 commits intomasterfrom
copilot/fix-horizontal-fusion-issue

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 5, 2026

After horizontal fusion, the combined screma can end up with duplicate entries in its input list when two SOACs share the same input array. Without deduplication, the fused screma looks structurally "heavier" than it should, causing the fusion pass to miss subsequent fusion opportunities in the same iteration.

Changes

  • src/Futhark/Optimise/Fusion/Screma.hs: Call dedupInput at the end of fuseSuperScrema, applied to inp_r and lam1 before the SuperScrema is returned. This is where duplicates are actually introduced (inp_r = inp_p <> inp_c_real), and deduplicating here removes extra lambda parameters by replacing them with let-binding aliases of the canonical parameter.

  • tests/fusion/horizontal-fusion-shared-input.fut: Regression test for the pattern from the issue — two outer maps over a shared array W, each computing a dot product against different matrices, followed by an elementwise combination:

def dotprod [n] (xs: [n]i32) (ys: [n]i32) : i32 =
  i32.sum (map2 (i32.*) xs ys)

entry main W A WH =
  let W_TA  = map (\Wr -> map (\Ac  -> dotprod Wr Ac)  (transpose A))  W
  let W_TWH = map (\Wr -> map (\WHc -> dotprod Wr WHc) (transpose WH)) W
  let H_update = map2 (map2 (i32./)) W_TA W_TWH
  in H_update

Without the fix the total Screma count in the SOACS pipeline is 4; with it, the two innermost reductions fuse horizontally and the count drops to 3.

Copilot AI linked an issue May 5, 2026 that may be closed by this pull request
Copilot AI changed the title [WIP] Fix horizontal fusion issue in Futhark program Fix horizontal fusion missed due to duplicate inputs not being eliminated May 5, 2026
Copilot finished work on behalf of athas May 5, 2026 14:34
Copilot AI requested a review from athas May 5, 2026 14:34
@athas athas requested a review from WilliamDue May 5, 2026 14:39
Copy link
Copy Markdown
Contributor

@WilliamDue WilliamDue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem can be solved more cleanly and safely by doing deduplication of inp_r in fuseSuperScrema maybe at the end of fuseSuperScrema or by applying a another function afterwards.

Comment thread src/Futhark/Optimise/Fusion/Screma.hs Outdated
form <- tryIdentityPost form'
pure (new_inp, form, new_out)
form'' <- tryIdentityPost form'
let (deduped_inp, deduped_form) = dedupInput new_inp form''
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably this is probably not most sane way to solve the problem actually, The deduplication should probably happen at the end of fuseSuperScrema, I think am pretty sure the duplciates are introduces in the definition of inp_r.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 86d4568. The dedupInput call is now at the end of fuseSuperScrema, applied to inp_r and lam1 (using a lightweight ScremaForm wrapper since dedupInput operates on that type). The fuseScrema function is back to its original simpler form.

@WilliamDue
Copy link
Copy Markdown
Contributor

I am quickly gonna try it out to see if I am correct.

@WilliamDue
Copy link
Copy Markdown
Contributor

Actually @athas do you mind if @ copilot to do it?

@athas
Copy link
Copy Markdown
Member

athas commented May 5, 2026

Knock yourself out.

@WilliamDue
Copy link
Copy Markdown
Contributor

@copilot Your current change does deduplication at the end of the construction of the fused Screma, I think it it is possible to only do the deduplication at the construction of the the fused SuperScrema in fuseSuperScrema, can you change the code such that it matches my vision or tell me if you believe I am wrong.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 5, 2026

@copilot Your current change does deduplication at the end of the construction of the fused Screma, I think it it is possible to only do the deduplication at the construction of the the fused SuperScr...

Changed in 86d4568 — deduplication now happens at the end of fuseSuperScrema, right before returning the SuperScrema, where inp_r = inp_p <> inp_c_real introduces the duplicates. fuseScrema is restored to its original form.

Copilot AI requested a review from WilliamDue May 5, 2026 16:17
@athas athas marked this pull request as ready for review May 5, 2026 17:30
@athas
Copy link
Copy Markdown
Member

athas commented May 5, 2026

@WilliamDue Looks like it works. Is this in greater accordance with your vision?

@WilliamDue
Copy link
Copy Markdown
Contributor

Yes I believe it is correct, maybe a bit hacky using dedupInput but correct I am sure.

@athas athas merged commit 5307884 into master May 5, 2026
28 of 31 checks passed
@athas athas deleted the copilot/fix-horizontal-fusion-issue branch May 5, 2026 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

This program does not fuse as it should

3 participants