`@noinline` `average_bulk_microphysics_tendencies` to reduce register pressure by petebachant · Pull Request #713 · CliMA/CloudMicrophysics.jl

petebachant · 2026-05-08T00:05:55Z

This kernel is now the hottest in prog EDMF 1M AMIP by a long shot, and this change produces a ~10% speedup (kernel analysis notebook). Disclaimer: Explanatory comments written by Claude--I don't yet have a deep understanding of what's going on here!

codecov · 2026-05-08T00:19:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.19%. Comparing base (5dd0a90) to head (2680d44).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #713   +/-   ##
=======================================
  Coverage   92.19%   92.19%           
=======================================
  Files          55       55           
  Lines        2420     2420           
=======================================
  Hits         2231     2231           
  Misses        189      189

Components	Coverage Δ
src	`93.11% <100.00%> (ø)`
ext	`69.47% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dennisYatunin

Claude's explanation is a bit suspicious given that the quadrature loop isn't being unrolled, but a 10% speedup sounds great! I'll think about how we can turn this into a simpler example for ClimaCore's compiler stress tests.

petebachant · 2026-05-11T21:02:23Z

Hypothesis: If Claude is correct, the issue comes from the quadrature loop being unrolled, so we de-unroll that we may be able to get these benefits without the performance hit on CPU.

Might be possible by dropping quadrature order from the type. Move from type into the value. Type can be int. Value is number of quadrature loops.

trontrytel · 2026-05-20T21:21:47Z

Is this PR something that should be merged or closed?

petebachant · 2026-05-20T21:26:47Z

I opened CliMA/ClimaAtmos.jl#4503 to retain the GPU performance gains and move changes to Atmos and avoid the 1.12 regression, but that one is a little uglier. Any preference from your end?

trontrytel · 2026-05-21T18:19:48Z

No preference. Whichever option you think is better?

trontrytel · 2026-05-21T23:12:19Z

Although I'm making some changes here: #717 where are grouped some output into a tuple. And Claude thinks that the performance will not be affected only if I keep inlining...

petebachant · 2026-05-22T12:53:31Z

I'm actually curious if the compiler will do a better job on either device if there is no macro on the function. I will give it a try, and if not, close this PR and focus on Atmos. It's not great to have to dig this deep into the stack for the performance gains.

petebachant · 2026-05-22T19:31:50Z

Running with no macro produced no performance change. Closing this and pursuing in Atmos.

trontrytel · 2026-05-22T21:04:54Z

Thank you!

noinline functions to reduce register pressure

f631a4a

petebachant requested a review from dennisYatunin May 8, 2026 00:06

petebachant added this to Performance May 8, 2026

petebachant moved this to In review in Performance May 8, 2026

dennisYatunin approved these changes May 8, 2026

View reviewed changes

petebachant self-assigned this May 11, 2026

Move comments inside docstrings

96c5bcf

petebachant moved this from In review to In progress in Performance May 18, 2026

petebachant mentioned this pull request May 20, 2026

Switch inlining for quadrature point evaluation based on device CliMA/ClimaAtmos.jl#4503

Draft

1 task

trontrytel added the Approved 🍀 label May 21, 2026

petebachant added 3 commits May 22, 2026 05:53

Merge branch 'main' into pb/perf

39795fc

Remove inlining macros

41d7824

Update docstrings

2680d44

petebachant marked this pull request as draft May 22, 2026 18:48

petebachant closed this May 22, 2026

github-project-automation Bot moved this from In progress to Done in Performance May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`@noinline` `average_bulk_microphysics_tendencies` to reduce register pressure#713

`@noinline` `average_bulk_microphysics_tendencies` to reduce register pressure#713
petebachant wants to merge 5 commits into
mainfrom
pb/perf

petebachant commented May 8, 2026

Uh oh!

codecov Bot commented May 8, 2026 •

edited

Loading

Uh oh!

dennisYatunin left a comment

Uh oh!

petebachant commented May 11, 2026 •

edited

Loading

Uh oh!

trontrytel commented May 20, 2026

Uh oh!

petebachant commented May 20, 2026

Uh oh!

trontrytel commented May 21, 2026

Uh oh!

trontrytel commented May 21, 2026

Uh oh!

petebachant commented May 22, 2026

Uh oh!

petebachant commented May 22, 2026

Uh oh!

trontrytel commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

petebachant commented May 8, 2026

Uh oh!

codecov Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dennisYatunin left a comment

Choose a reason for hiding this comment

Uh oh!

petebachant commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trontrytel commented May 20, 2026

Uh oh!

petebachant commented May 20, 2026

Uh oh!

trontrytel commented May 21, 2026

Uh oh!

trontrytel commented May 21, 2026

Uh oh!

petebachant commented May 22, 2026

Uh oh!

petebachant commented May 22, 2026

Uh oh!

trontrytel commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented May 8, 2026 •

edited

Loading

petebachant commented May 11, 2026 •

edited

Loading