Skip to content

Expand pass#740

Draft
ramonwirsch wants to merge 3 commits into
mainfrom
expand-pass
Draft

Expand pass#740
ramonwirsch wants to merge 3 commits into
mainfrom
expand-pass

Conversation

@ramonwirsch

Copy link
Copy Markdown
Member

Reworked the expansion pipeline into a single MathExpansionPass.

Warning: the pass currently does not clear analysis-cache between expands, so it is currently up to each and every expand call to not let any cache survive that is outdated for other library nodes after it expanded itself.

 * Will collect all math-nodes and attempt to expand them all in reverse execution order to minimize effects of expands on other analysis
 ! does not clear any analysis (which makes it fragile. Should not matter for current expands(), but slightly dangerous until we have removed the need for ScopeAnalysis or can more granularly clear analyses.
@daisytuner

daisytuner Bot commented Jun 8, 2026

Copy link
Copy Markdown

Daisytuner Report - mlir_torch_models (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# resnet18_torch         19.85 s     -0.40%      N/A         3787.49 J   -3.96%      
# resnet18_docc_none     17.96 s     -0.27%      N/A         4575.06 J   -3.77%      
# resnet18_docc_sequential17.57 s     +1.21%      N/A         4460.60 J   -2.32%      
# resnet18_docc_openmp   23.62 s     -0.75%      N/A         6696.29 J   -4.00%      
# resnet18_docc_cuda     5.49 s      -0.56%      N/A         1013.57 J   -4.47%      

@daisytuner

daisytuner Bot commented Jun 8, 2026

Copy link
Copy Markdown

Daisytuner Report - python_npbench (zinnia)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# adi_numpy              1.32 s      +1.18%      N/A         134.26 J    +0.48%      
# adi_omp                2.53 s      -1.55%      N/A         315.31 J    -2.16%      
# adi_cuda               3.64 s      -2.11%      N/A         366.08 J    -2.58%      
# adi_seq_tuning         2.59 s      +1.18%      N/A         241.49 J    +0.16%      
# atax_numpy             2.15 s      -0.45%      N/A         227.71 J    -1.15%      
# atax_omp               3.13 s      +1.11%      N/A         404.67 J    +1.61%      
# atax_cuda              4.28 s      -0.17%      N/A         450.22 J    -0.71%      
# atax_seq_tuning        2.94 s      +2.01%      N/A         293.25 J    +0.08%      
# gemm_numpy             1.22 s      +0.26%      N/A         197.19 J    -0.25%      
# gemm_omp               1.16 s      +0.16%      N/A         170.10 J    +0.05%      
# gemm_cuda              10.67 s     -0.53%      N/A         1039.48 J   -1.17%      
# gemm_seq_tuning        1.16 s      +0.24%      N/A         169.31 J    -0.42%      
# gesummv_numpy          1.74 s      -0.43%      N/A         251.84 J    -1.08%      
# gesummv_omp            2.09 s      -4.61%      N/A         344.35 J    -6.01%      
# gesummv_cuda           5.28 s      -0.67%      N/A         734.41 J    -2.21%      
# gesummv_seq_tuning     5.31 s      -1.74%      N/A         681.63 J    -1.33%      
# gemver_numpy           1.08 s      -1.04%      N/A         168.31 J    -1.86%      
# gemver_omp             944.97 ms   -0.89%      N/A         126.74 J    -1.87%      
# gemver_cuda            2.53 s      -0.53%      N/A         273.29 J    -1.16%      
# gemver_seq_tuning      1.70 s      -1.05%      N/A         145.88 J    -2.25%      
# k2mm_numpy             1.19 s      -0.71%      N/A         198.44 J    -1.20%      
# k2mm_omp               3.58 s      -1.24%      N/A         662.24 J    -3.60%      
# k2mm_cuda              12.73 s     +0.02%      N/A         1236.17 J   -0.75%      
# k2mm_seq_tuning        3.00 s      -1.97%      N/A         405.61 J    -1.90%      
# k3mm_numpy             1.02 s      -0.41%      N/A         183.57 J    -1.10%      
# k3mm_omp               5.61 s      +0.16%      N/A         961.56 J    +0.36%      
# k3mm_cuda              18.38 s     -0.82%      N/A         1776.55 J   -1.51%      
# k3mm_seq_tuning        4.96 s      +0.37%      N/A         697.70 J    -1.37%      
# mvt_numpy              2.43 s      -0.44%      N/A         254.68 J    -1.12%      
# mvt_omp                2.78 s      -0.44%      N/A         295.35 J    -1.10%      
# mvt_cuda               3.44 s      -0.28%      N/A         358.95 J    -1.03%      
# mvt_seq_tuning         2.79 s      +0.23%      N/A         295.73 J    -0.56%      
# symm_numpy             781.37 ms   -0.47%      N/A         82.18 J     -1.06%      
# symm_omp               1.02 s      -0.40%      N/A         123.68 J    -2.16%      
# symm_seq_tuning        1.88 s      +1.71%      N/A         155.42 J    -0.23%      
# syr2k_numpy            878.78 ms   +0.66%      N/A         91.41 J     +0.12%      
# syr2k_omp              935.37 ms   -2.78%      N/A         106.90 J    -2.70%      
# syr2k_cuda             1.61 s      +0.36%      N/A         171.40 J    -0.48%      
# syr2k_seq_tuning       932.00 ms   -1.15%      N/A         106.50 J    -1.50%      
# syrk_numpy             768.69 ms   -0.56%      N/A         80.83 J     -1.23%      
# syrk_omp               785.18 ms   +0.37%      N/A         92.15 J     -0.57%      
# syrk_cuda              1.41 s      -1.23%      N/A         153.07 J    -1.63%      
# syrk_seq_tuning        783.85 ms   -0.20%      N/A         92.17 J     -0.69%      
# trmm_numpy             890.48 ms   +1.72%      N/A         92.49 J     +0.74%      
# trmm_omp               754.34 ms   -0.37%      N/A         96.23 J     -1.38%      
# trmm_seq_tuning        1.63 s      -4.75%      N/A         134.52 J    -3.85%      

@ramonwirsch

Copy link
Copy Markdown
Member Author

Cannot work without larger rewrites.
We currently emit further MathNodes from the expand of math nodes, requiring recursive expansion. But we do not keep metadata about how the expansion went (newly produced blocks / sequence members to scan for recursive evaluation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant