Skip to content

simplifies GPU transformations#764

Open
lukastruemper wants to merge 4 commits into
mainfrom
gpu-transformations
Open

simplifies GPU transformations#764
lukastruemper wants to merge 4 commits into
mainfrom
gpu-transformations

Conversation

@lukastruemper

Copy link
Copy Markdown
Contributor

No description provided.

@daisytuner

daisytuner Bot commented Jun 14, 2026

Copy link
Copy Markdown

Daisytuner Report - mlir_torch_models (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# resnet18_torch         19.90 s     -0.06%      N/A         3998.00 J   +1.49%      
# resnet18_docc_none     17.92 s     -0.36%      N/A         4750.79 J   -0.12%      
# resnet18_docc_sequential17.59 s     +0.02%      N/A         4657.03 J   +0.55%      
# resnet18_docc_openmp   23.57 s     +0.11%      N/A         6946.06 J   +0.51%      
# resnet18_docc_cuda     5.50 s      +1.99%      N/A         1071.28 J   +1.60%      

@daisytuner

daisytuner Bot commented Jun 14, 2026

Copy link
Copy Markdown

Daisytuner Report - python_npbench (zinnia)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# adi_numpy              1.31 s      +0.73%      N/A         137.78 J    +2.87%      
# adi_omp                2.74 s      +2.98%      N/A         356.63 J    +4.33%      
# adi_cuda               3.74 s      +1.74%      N/A         387.13 J    +3.88%      
# adi_seq_tuning         2.59 s      +1.67%      N/A         251.88 J    +3.57%      
# atax_numpy             2.14 s      -0.38%      N/A         233.10 J    +1.88%      
# atax_omp               3.13 s      +1.21%      N/A         413.02 J    +3.43%      
# atax_cuda              4.28 s      +0.03%      N/A         460.96 J    +2.04%      
# atax_seq_tuning        2.86 s      -1.05%      N/A         296.94 J    +1.84%      
# gemm_numpy             1.24 s      +2.51%      N/A         203.43 J    +3.42%      
# gemm_omp               1.17 s      +1.13%      N/A         173.69 J    +2.35%      
# gemm_cuda              10.73 s     +0.14%      N/A         1077.75 J   +2.59%      
# gemm_seq_tuning        1.16 s      +0.79%      N/A         171.70 J    +1.36%      
# gesummv_numpy          1.76 s      +0.28%      N/A         259.76 J    +1.37%      
# gesummv_omp            2.18 s      +1.78%      N/A         360.34 J    +3.10%      
# gesummv_cuda           5.48 s      +3.32%      N/A         757.17 J    +2.20%      
# gesummv_seq_tuning     4.77 s      +2.19%      N/A         625.78 J    +1.63%      
# gemver_numpy           1.08 s      -0.24%      N/A         171.25 J    +0.33%      
# gemver_omp             942.56 ms   -1.15%      N/A         126.05 J    -3.04%      
# gemver_cuda            2.55 s      +0.76%      N/A         280.79 J    +2.65%      
# gemver_seq_tuning      1.62 s      -4.20%      N/A         149.14 J    +2.16%      
# k2mm_numpy             1.20 s      -0.13%      N/A         201.42 J    +0.77%      
# k2mm_omp               3.60 s      +0.46%      N/A         678.65 J    -0.42%      
# k2mm_cuda              12.78 s     +0.27%      N/A         1279.84 J   +2.76%      
# k2mm_seq_tuning        3.04 s      +1.77%      N/A         416.56 J    +1.96%      
# k3mm_numpy             1.03 s      +0.59%      N/A         186.33 J    +0.65%      
# k3mm_omp               5.61 s      -0.11%      N/A         989.21 J    +2.27%      
# k3mm_cuda              18.50 s     +0.34%      N/A         1842.84 J   +2.88%      
# k3mm_seq_tuning        4.96 s      +0.27%      N/A         705.32 J    +0.02%      
# mvt_numpy              2.41 s      -1.11%      N/A         258.72 J    +0.98%      
# mvt_omp                2.77 s      -0.37%      N/A         302.82 J    +1.92%      
# mvt_cuda               3.44 s      -0.17%      N/A         369.30 J    +2.23%      
# mvt_seq_tuning         2.77 s      -0.54%      N/A         301.89 J    +1.71%      
# symm_numpy             784.91 ms   +0.92%      N/A         85.00 J     +3.34%      
# symm_omp               1.07 s      +6.80%      N/A         132.96 J    +8.67%      
# symm_seq_tuning        1.81 s      -4.28%      N/A         159.69 J    +1.47%      
# syr2k_numpy            885.80 ms   +1.83%      N/A         94.60 J     +3.90%      
# syr2k_omp              983.41 ms   +3.20%      N/A         113.84 J    +4.53%      
# syr2k_cuda             1.64 s      +3.57%      N/A         179.35 J    +5.20%      
# syr2k_seq_tuning       980.63 ms   +3.51%      N/A         113.41 J    +4.59%      
# syrk_numpy             778.02 ms   +1.76%      N/A         84.16 J     +3.80%      
# syrk_omp               810.44 ms   +2.75%      N/A         96.62 J     +3.70%      
# syrk_cuda              1.48 s      +2.40%      N/A         163.01 J    +4.21%      
# syrk_seq_tuning        812.60 ms   +4.41%      N/A         96.79 J     +5.13%      
# trmm_numpy             879.60 ms   +1.05%      N/A         94.44 J     +3.39%      
# trmm_omp               812.54 ms   +8.19%      N/A         105.65 J    +9.64%      
+ trmm_seq_tuning        1.65 s      -10.49%     N/A         141.76 J    -1.89%      

@lukastruemper lukastruemper changed the title Deprecates monolithic GPU transformations in favor of composable transformations simplifies GPU transformations Jun 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant