You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR will add some changes to reduce registers and improve performance.
Tested with NVHPC nightly on single Santis node with 4 Grace-Hoppers on 3D_IGR_TaylorGreenVortex_nvidia case.
PR Type
Enhancement
Description
Add GPU register limit optimization for NVIDIA Grace-Hopper
Include GPU memory management directive for Jacobian arrays
Diagram Walkthrough
flowchart LR
A["CMakeLists.txt"] -- "add maxregcount:165" --> B["GPU Register Limit"]
C["m_igr.fpp"] -- "add GPU_DECLARE" --> D["GPU Memory Management"]
B --> E["Performance Optimization"]
D --> E
Loading
File Walkthrough
Relevant files
Configuration changes
CMakeLists.txt
Set GPU register count limit
CMakeLists.txt
Add -gpu=maxregcount:165 compiler flag to limit GPU register usage
ntselepidis
changed the title
Do some performance tuning for NVIDIA systems
Performance tuning for NVIDIA Grace-Hopper for the Gordon Bell runs
Aug 17, 2025
The $:GPU_DECLARE(create='[jac, jac_rhs, jac_old]') is introduced for pointer variables jac, jac_rhs, and jac_old. If these pointers are later associated/allocated differently than expected, the directive may not correctly manage their device residency, potentially causing mismatched host/device pointers or lifetimes. Validate that allocation/association and deallocation flows are compatible with this directive.
Hard-coding -gpu=maxregcount:165 may improve occupancy on GH200 but can cause spills and regressions on other NVIDIA GPUs or with different kernels/configurations. Consider guarding by architecture, making it configurable, or validating via ptxinfo to ensure no excessive spilling occurs.
Combine GPU flags into a single -gpu option to avoid later flags overriding earlier ones in NVHPC. Using multiple -gpu options can lead to only the last one being honored, silently dropping requested features. Merge them with commas to ensure all settings apply.
Why: The suggestion correctly identifies that using multiple -gpu flags with the NVHPC compiler can lead to earlier flags being overridden, and proposes the correct fix of merging them.
Medium
Fix GPU declare list syntax
Ensure the GPU declaration uses the correct list syntax expected by your codegen/macros. If the directive expects a comma-separated list without brackets, the current bracketed form may be ignored at compile-time, leaving jac arrays undeclared on GPU.*
Why: The suggestion correctly identifies a potential syntax issue in the custom $:GPU_DECLARE directive, but it is speculative as it depends on the macro's specific implementation which is not visible.
My understanding is that the change to m_igr is guarranteed to improve performance, but the addition of -gpu=max:regcount is specific to IGR and could mess with performance for WENO and other existing features.
As @wilfonba correctly noted, we found that setting -gpu=maxregcount:128 improves performance of m_igr on Grace-Hopper, as it reduces the registers and leads to higher occupancy.
Before merging this we need to:
guard this in the CMakefile.txt so that it takes effect depending on the target architecture, i.e. Grace-Hopper
evaluate the performance impact of this flag on WENO on Grace-Hopper
evaluate the performance impact of this flag on other architectures such as Blackwell or Ampere
double check if the same value is fine for single and double precision arithmetic
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
User description
This PR will add some changes to reduce registers and improve performance.
Tested with NVHPC nightly on single Santis node with 4 Grace-Hoppers on
3D_IGR_TaylorGreenVortex_nvidiacase.PR Type
Enhancement
Description
Add GPU register limit optimization for NVIDIA Grace-Hopper
Include GPU memory management directive for Jacobian arrays
Diagram Walkthrough
File Walkthrough
CMakeLists.txt
Set GPU register count limitCMakeLists.txt
-gpu=maxregcount:165compiler flag to limit GPU register usagem_igr.fpp
Add GPU memory management directivesrc/simulation/m_igr.fpp
$:GPU_DECLAREdirective for Jacobian arrays (jac,jac_rhs,jac_old)