Skip to content

Comments

Full GPU support#1040

Draft
Thanduriel wants to merge 146 commits intodevelopfrom
kokkos
Draft

Full GPU support#1040
Thanduriel wants to merge 146 commits intodevelopfrom
kokkos

Conversation

@Thanduriel
Copy link
Member

@Thanduriel Thanduriel commented Feb 9, 2026

Full GPU support

related Issues: #671, #672


Remaining Issues

  • The plan is to merge Add Halo Exchange to Dynamics #924 first, which will probably necessitate some more changes.
  • The mac test still fails. I don't have a setup to reproduce this.

Reviewers: Since there is a lot of code to cover, it probably does not hurt to have a look already.

Change Description

Ported all major parts of neXtSIM-DG to Kokkos, enabling fast execution on a GPU. This involves major changes to the dynamics and physics (thermodynamics) but also the core infrastructure related to data exchange between ModelComponents.

Core

The most invasive change is the introduction of ModelArrayAccessor. This handle replaces ModelArrayRef as a mechanism to share fields between ModelComponents but also takes care of host/device data transfers in a transparent manner. To achieve the latter, the new handle as different ergonomics. In order to a access field, one has to explicitly request read or write access for either host or device each time. The resulting ModelArray (or ModelArrayDevice) reference should only be used within the current scope to ensure that host and device buffers can be synced in between uses as needed.

Portable kernels

In order to write code just once that runs with and without Kokkos, a thin wrapper is provided in KernelAlternatives.hpp. These utilities are meant to be used in combination with the Auto variants that map to the chosen backend, e.g.

  • ModelArrayAccessor::getAutoRW()/ ModelArrayAccessor::getAutoRO() to get the buffer
  • overElementsAuto to run the computations,
  • OVER_ELEMENTS_LAMBDA to define an appropriate lambda.

Additionally, device code has a number of restrictions which have to be kept in mind in order to make a computation kernel portable. These are:

  • Every variable used needs to be captured so that it can be copied to the device. This is mostly done through the implicit capture mechanism of lambdas. For device execution, everything is captured by copy and than automatically transferred to the device. Most notably, this includes local ModelArrayDevice objects, acquired through a ModelArrayAccessor which are pointers to already existing buffers. Static class variables on the other hand need to be assigned to a local variable first in order to capture them.
  • Functions called within a kernel need to be marked for device execution and satisfy the same constrains as the kernel. They have to be defined inline and should probably be pure functions. Member functions are fine only if the whole object can be captured by the lambda and if they are not virtual.
  • Standard math and utility functions should be taken from the namespace Utils which maps to either std or Kokkos.
  • Global constants can be used directly, unless they only occur as argument to a template function (e.g. min, max). In the latter case, just assign the constant to a local variable first.

Physics

The major thermodynamics modules where rewritten as portable kernels. To get around the kernel restrictions, two modules had to be converted into proper ModelComponents since their implementation as virtual function that operates on a single element does not work anymore. IIceAlbedo introduces new fields which are updated internally. The albedo parameter i0 (previously also I_0 in some cases) now also resides with the concrete IIceAlbedo implementation instead of the calling module. The other module with a changed interface is IFreezingPoint. Since the update was already done on existing fields fewer changes where needed and FreezingPointImpl provides a convenient mechanism to generate the needed implementation and the old interface while only requiring the definition of a single element function, as before.

With disciplined use of accessors it is still easily possible to mix host and device code. Adding new modules in the old style is therefore not an issue. However, switching the execution space within a single module is discouraged since this incurs a significant cost from memory transfers and it is easier to introduce synchronization bugs.

Dynamics

Kokkos variants of the dynamics are built on top of the old ones. The interface was kept mostly the same with some additions to minimize needed host/device data transfers when interacting with the other parts of neXtSIM-DG. To port the computations additional tools where needed compared to the thermodynamics. Most important is the Kokkos / Eigen interface provided in KokkosUtils.hpp that makes switching between Kokkos views and Eigen matrices effortless while preserving compile-time information.

In principle, it would be possible to roll the remaining parts that are used from the old variants into the Kokkos dynamics and keep only the latter to ease future maintenance. However, this is out of scope for the current PR as their are some remaining roadblocks:

  • either Kokkos become mandatory for the build or an additional abstraction layer similar to the thermodynamics is needed
  • the Kokkos code is more difficult to work with, presenting a barrier of entry for future developments
  • some of the Kokkos kernels are not deterministic; while it should be possible to change this at some performance cost, achieving good performance under this constrain would likely require specialized kernels for CPU and GPU
  • MPI support

Test Description

Many existing tests had to be updated to work with the new ModelArrayAccessor infrastructure. When Kokkos is enabled, tests automatically use the selected backend. For full code coverage, it is therefore best to run the whole suite with and without Kokkos enabled.


Documentation Impact

Build instructions need to be provided. Otherwise there should be little difference for users.

@Thanduriel Thanduriel linked an issue Feb 10, 2026 that may be closed by this pull request
@Thanduriel Thanduriel self-assigned this Feb 23, 2026
@Thanduriel Thanduriel added the enhancement New feature or request label Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GPU thermodynamics implementation

1 participant