Extending the orbit average phase and pseudo orbit averaging to 2x, multispecies simulations, and tandem mirrors by Maxwell-Rosen · Pull Request #934 · ammarhakim/gkeyll

Maxwell-Rosen · 2026-01-26T21:52:49Z

Feature

Summary

Purpose: The orbit average phase and pseudo orbit average scheme were first implemented for single species simulations. This must be generalized to kinetic electrons simulations, 2x, and tandem mirrors. This PR adds these capabilities

fixes #793

Implementation Details

Key changes:
gk_species_fdot_multiplier.c and gk_species_damping.c are modified to use an array_dg_find_peaks object. This is necessary to identify the locations of the walls, the tandem mirror peak, and the inner mirror peak. Critically, the maximum bmag and phi were previously defined at a single point, the mirror throat. Now, Bmag and phi are arrays in 2x, or a single-component array in 1x. These must be global arrays because peak finding is a global operation (each process must see global bmag and phi to find phi at the peaks of bmag).

array_dg_find_peaks.c is introduced as a new method inside core. This takes any DG array and finds the peaks. Currently, it is limited to 2D and 1D arrays, but this would be very complicated in 3D. It takes your 2D array and constructs N*1D arrays for the peaks. We can then project another 2D array onto the peaks of the initial 2D array. Comprehensive unit tests are added for this core method. GPU unit tests are included. The GPU implementation is a bit redundant since peak finding is recursive (and done on CPU) and the projection would be faster on CPU since it wouldn't have to launch a CUDA kernel, but Gkeyll is constructed in such a way that memory must lie on the GPU when running GPU code, so it's difficult to have this object always live on host and copy the neccisary information over. I tried this and it had difficulty interfacing with the loss cone mask module.

loss_cone_mask_gyrokinetic is adjusted for the changes necessary for 2x mirrors, tandem, and kinetic electrons. This meant a big overhaul of this updater so it's working with arrays, as well as extra arguments for tandem and kinetic electrons.

Dependencies: core/zero/array_dg_find_peaks.c

Automated testing: four regression tests are added

gk_mirror_boltz_elc_poa_2x2v_p1
gk_mirror_kinetic_elc_poa_1x2v_p1
gk_mirror_tandem_boltz_elc_poa_1x2v
gk_wham_kinetic_poa_1x2v_p1

LLM details: I relied heavily on LLMs for this PR. The find_peaks method is almost entirely written by LLMs, given a few examples. It's thoroughly tested to machine precision in the unit test and performs well.

Community Standards

Documentation has been updated.
My code follows the project's coding guidelines.
Changes to layer/zero should have a unit test, e.g., core/zero.

Testing:

Additional Notes

Here is some evidence that the mask works correctly for the CFL condition and is different for the electrons and ions.

pgkyl gk_mirror_kinetic_elc_poa_1x2v_p1-ion_cflrate_10.gkyl sel --z0 0.0 pl --title "Ion Z=0" --xlabel "vpar" --ylabel "mu" &
pgkyl gk_mirror_kinetic_elc_poa_1x2v_p1-ion_cflrate_10.gkyl sel --z0 1.5 pl --title "Ion Z=1.5" --xlabel "vpar" --ylabel "mu" &
pgkyl gk_mirror_kinetic_elc_poa_1x2v_p1-elc_cflrate_10.gkyl sel --z0 0.0 pl --title "Elc Z=0.0" --xlabel "vpar" --ylabel "mu" &
pgkyl gk_mirror_kinetic_elc_poa_1x2v_p1-elc_cflrate_10.gkyl sel --z0 1.5 pl --title "Elc Z=1.5" --xlabel "vpar" --ylabel "mu" &

When comparing the 1x2v test of gk_wham_poa_1x2v_p1, it reproduces the exact same mask, even in parallel on GPU. All tests here are over 4 GPUs on perlmutter

pgkyl tests-main/gk_wham_poa_1x2v_p1-ion_fdot_multiplier_10.gkyl tests-branch/gk_wham_poa_1x2v_p1-ion_fdot_multiplier_10.gkyl ev "f[0] f[1] -" info
f[0] f[1] - (default#2)
├─ Time: 6.002000e-06
├─ Frame: 10
├─ Number of components: 1
├─ Number of dimensions: 3
├─ Grid: (uniform)
│  ├─ Dim 0: Num. cells: 288; Lower: -2.000000e+00; Upper: 2.000000e+00
│  ├─ Dim 1: Num. cells: 32; Lower: -1.000000e+00; Upper: 1.000000e+00
│  └─ Dim 2: Num. cells: 32; Lower: 0.000000e+00; Upper: 1.000000e+00
├─ Maximum: 0.000000e+00 at (0, 0, 0)
├─ Minimum: 0.000000e+00 at (0, 0, 0)
├─ geometry_type: 2
├─ geqdsk_sign_convention: 0

pgkyl tests-main/gk_wham_poa_1x2v_p1-ion_fdot_multiplier_10.gkyl tests-branch/gk_wham_poa_1x2v_p1-ion_fdot_multiplier_10.gkyl ev "f[0] f[1] -" info
f[0] f[1] - (default#2)
├─ Time: 6.002000e-06
├─ Frame: 10
├─ Number of components: 1
├─ Number of dimensions: 3
├─ Grid: (uniform)
│  ├─ Dim 0: Num. cells: 288; Lower: -2.000000e+00; Upper: 2.000000e+00
│  ├─ Dim 1: Num. cells: 32; Lower: -1.000000e+00; Upper: 1.000000e+00
│  └─ Dim 2: Num. cells: 32; Lower: 0.000000e+00; Upper: 1.000000e+00
├─ Maximum: 0.000000e+00 at (0, 0, 0)
├─ Minimum: 0.000000e+00 at (0, 0, 0)
├─ DG info:
│  ├─ Polynomial Order: 0
│  └─ Basis Type: serendipity (modal)
├─ Geometry info:
│  ├─ Type: GKYL_GEOMETRY_MIRROR
│  ├─ GEQDSK sign convention: 0

The tandem mirror loss mask is reproduced well.

$\mu_{\rm left} = \max_{i<k}\left(\frac{\frac{1}{2}m_s v_\parallel^2 + q_s(\phi(z) - \phi_{mi})}{B_{mi} - B(z)}\right)$
$\mu_{\rm right} = \max_{i>k}\left(\frac{\frac{1}{2}m_s v_\parallel^2 + q_s(\phi(z) - \phi_{mi})}{B_{mi} - B(z)}\right)$
$\mu = \min\left(\mu_{\rm left} ,\mu_{\rm right} \right)$

pgkyl gk_mirror_tandem_boltz_elc_poa_1x2v-bmag.gkyl interp pl --title "|B| for tandem mirror" --xlabel "Z"

pgkyl gk_mirror_tandem_boltz_elc_poa_1x2v-ion_fdot_multiplier_10.gkyl sel --z2 12 pl --xlabel "Z" --ylabel "computational vpar" --title "Loss cone mask of tandem mirror"
--c2p-vel does not work for this figure because it is c0.

Compute sanitizer clean on perlmutter
compute-sanitizer --tool memcheck --leak-check full ./cuda-build/gyrokinetic/creg/rt_gk_mirror_boltz_elc_poa_2x2v_p1 -g -s1
========= COMPUTE-SANITIZER

Running phase 0 @ t = 0.000000000e+00 ...
Taking time-step 1 at t = 0 ... dt = 0.000132681

Number of update calls 1
Number of forward-Euler calls 3
Number of RK stage-2 failures 0
Number of RK stage-3 failures 0
Number of write calls 9
Timing: [removed]
========= LEAK SUMMARY: 0 bytes leaked in 0 allocations
========= ERROR SUMMARY: 0 errors

rt_gk_mirror_boltz_elc_damped_1x2v_p1 runs as usual on 4 GPU.

This produces reasonable results for the 2x2v mirror
pgkyl gk_mirror_boltz_elc_poa_2x2v_p1-ion_fdot_multiplier_2.gkyl sel --z1 0.0 --z3 14 pl --xlabel "psi" --ylabel "vpar"

pgkyl gk_mirror_boltz_elc_poa_2x2v_p1-ion_fdot_multiplier_2.gkyl sel --z0 0 --z3 14 pl --xlabel "Z" --ylabel "vpar"

…e maximum magnetic field is determined. Bmag max is stored as a gkyl_array. Right now, we only do this for bmag, but we need to store phi as a 1d maximum array. I haven't decided on the final design for how 2x OAP simulations should be accomodated. Perhpas we need some general gkyl_dg_array_reduce methods that take a 2D array and turn it into a 1D array instead of a 0D number. This is a kind of reduction method, but it's not a total reduction. I tested the regression test included for a 2x2v boltzmann mirror and the output of the magnetic field looks correct. The current implementation evaluates bmag at cell corners, but we ideally should do the corners in Z, but the quadrature nodes in psi.

- Introduced a new header file `gkyl_array_dg_find_peaks.h` that defines a structure and functions for finding peaks (local maxima, minima, and boundary values) in DG fields. - Implemented an internal structure in `gkyl_array_dg_find_peaks_priv.h` to manage peak finding operations, including storage for peak values and coordinates. - Removed unused initialization and writing of `bmag_max` arrays in `gyrokinetic.c` to streamline the geometry setup process. - Deleted the `gkyl_gk_geometry_bmag_max_init` and `gkyl_gk_geometry_bmag_max_release` functions from `gk_geometry.c` as they are no longer needed, simplifying the geometry management.

…es another DG array at the peaks of the initilized array. Add appropriate unit tests which pass to ctest_array_dg_find_peaks. Update gk_species_fdot_multiplier to use the project_on_peaks function with phi. Now, everything passed to loss_cone_mask_gyrokinetic is a gkyl_array. The loss_cone_mask is updated accordingly

…ount of compution we need for evaluating phi at its peak in the app. Unit tests pass. Regression tests look fine as well. They're all valgrind clean. I think the right way to do the paralellism is to do the peak finding on a global bmag, just like how it is done for the position_map, then when we evaluate phi, all processes evaluate it at this peak, however only one will return a true value. This process will broadcast the array to the rest of the processes

…on at the peak locations

… method, which is just like find_peaks, but it computes the global maximum or minimum.

…nd regression tests are brought over from another branch. Unit tests for the array mask, loss cone mask, and the regression tests for the kinetic electron POA mirror are valgrind free

…grind clean. Regression test is added and produces reasonable results.

…formance - Introduced a helper function `mkarr` to streamline array allocation for GPU and CPU. - Removed the `gkyl_loss_cone_mask_gyrokinetic_Dbmag_quad_wall` function and integrated its logic into the main processing flow. - Updated the GPU kernel `gkyl_loss_cone_mask_gyrokinetic_Dbmag_quad_cu_ker` to compute `Dbmag_quad` directly from `bmag_peak` instead of `bmag_max`. - Enhanced tandem mirror support by adding handling for `bmag_peak` and `phi_m` in the GPU kernels. - Simplified the logic for determining trapped particles in the `gkyl_loss_cone_mask_gyrokinetic_ker` and `gkyl_loss_cone_mask_gyrokinetic_quad_ker` functions. - Improved readability and maintainability by restructuring conditional checks and variable assignments.

…plier. Add possibility of kinetic electrons and tandem mirrors. The damping regression test is failing, both here and on main. They are for different issues. Main fails because the loss_cone updater has an issue. Here, it fails because it's using scale_by_cell with a multi-component array. I'm not sure the right way to fix this

…ove the aspects about the cellwise evaluation and quadrature points because that breaks the array_scale_by_cell method which is used

… refactor this in the future, but it is just proof of concept for now to make sure it works correctly.

… arrays need to be passed to objects like the loss_cone_mask, where it expects these to be GPU arrays. It's just easier to have this module fit the archatecture of the rest of the code, rather than doing something different and copying between device and host. It wouldn't interface well. Claude generated most of the cuda code, with strong guidence from Maxwell

…h compute sanitizer with the array_find_peaks which was causing crashes in the loss cone mask. These issues are fixed. There was some funny business regarding the basis being on the host vs device. Refactor the allocations in the GPU kernels to not be inside the kernels. Instead, it's allocated at init time. The GPU code pulses, which is odd, but it runs. The 2x2v POA regression test runs and is compute sanitizer clean. The other POA tests do not error either on GPU and are compute sanitizer clean.

…itting code to main and broke a unit test with the geometry enum changes

…hi_smooth_global array for improved performance and consistency across computations.

…er array release

…ion of doing the kinetic electron and tandem mirror. The code is built again to make sure nothing is affected.

…=2 relevant code for the peak finders

Maxwell-Rosen · 2026-02-24T20:07:28Z

@JunoRavin and I have expressed some concerns over the GPU implementation of the find_peaks methods. It's a bit silly to compute here on the GPU, since we're nowhere close to saturating it. However, from a memory management standpoint, having everything on the GPU is simpler. The memory management organization is why I decided on the current implementation.

To ensure that we are not wasting time, I have timed the gkyl_array_dg_find_peaks_project_on_peak_idx function in the test
rt_gk_mirror_boltz_elc_poa_1x2v_p1 on 1 GPU on Perlmutter. I am summing up all the time spent in each function call over the entire regression test. From the find_peaks methods, only the project_on_peak is called inside the time loop since the advance function runs during initialization of the fdot multiplier to find where the peaks are.

Even inside this gk_species_fdot_multiplier_advance_loss_cone_mult function, the cost of the peak projection is small compared to the gkyl_loss_cone_mask_gyrokinetic_advance, so we should not be worried about inefficient use of the GPU. Accelerating this module will not accelerate the code in a meaningful way.

advance_loss_cone_mult accumulated timings over 621 calls (s):
  allgather        : 0.00580832
  find_peaks (max) : 0.00985657
  loss_cone_multiplier_advance      : 0.0727482
  scale_by_cell    : 0.00640832
  total            : 0.104605

In total, this simulation took 18 seconds and 0.02 seconds were spent on the gkyl_array_dg_find_peaks_project_on_peak_idx function. I would say this is a win.

…egression test (and in my production simulations)

…heme. There is no POA scheme in the run methods.

…bugging

…nto the loss_cone_mask_advance, but the itterators are local, so they were itterating over the local ranges, which was offsetting phi when it is evaluated. It should instead be using the local phi. Furthermore, I cleaned up the unit tests a little. I also realize that we should not be calling up->c2p_pos(xmu, xmu, up->c2p_pos_ctx);. This is because the spatial coordinates are only used to find if we are beyond the mirror throat and in the expander. The mirror throat is evaluated in computational coordinates, so we should compare computational to computational coordinates. I'm not sure if this is a mistake on main because in this branch, I re-did how the mirror throat is found using the find_peaks operator.

…erything is done in computational coordinates

Maxwell-Rosen · 2026-03-17T15:06:01Z

I have my first example of a 2x mirror of ions with a Boltzmann field, using the POA scheme. This was performed on 4 GPUs on Perlmutter. The simulation used 8 cells in psi, the sqrt_psi grid, $R_m = 5$, $\tau_{OAP} = 0.1 s$, $\tau_{FDP} = 5 \times 10^{-6} s$, 4 POA cycles, without ion-electron collisions, without time dilation. I loaded the initial condition from a 1x simulation for this geometry, projected along all field lines. This simulation is brief, but shows promise, as it does not crash.

Here is the CFL rate on the interior field line at the midplane. The --c2p-vel option gives an error for making this plot.

Here is the distribution function at this point

Maxwell-Rosen added 13 commits December 10, 2025 14:21

Remove some memory allocation during the advance methods and evaluati…

aa4afa0

…on at the peak locations

Remove old elements from gk_geometry. Add a agkyl_array_dg_reduce_dir…

976b6bd

… method, which is just like find_peaks, but it computes the global maximum or minimum.

Merge branch 'main' into gk-oap-2x-multispecies

61fa5ca

Implement kinetic electrons into the POA scheme. The unit tests run a…

3b933bb

…nd regression tests are brought over from another branch. Unit tests for the array mask, loss cone mask, and the regression tests for the kinetic electron POA mirror are valgrind free

Add support for symmetric tandem mirrors. Unit tests pass and are val…

79133e7

…grind clean. Regression test is added and produces reasonable results.

gk_species_damping is working on this branch too now. I needed to rem…

a6bff16

…ove the aspects about the cellwise evaluation and quadrature points because that breaks the array_scale_by_cell method which is used

Add another unit test to the loss cone mask

e2bc28f

This was referenced Jan 26, 2026

Orbit average phase for multispecies simulations #893

Closed

Orbit average phase for tandem mirrors #907

Closed

Maxwell-Rosen mentioned this pull request Feb 9, 2026

[DR] Pseudo-orbit average (POA) multiscale algorithm for magnetic mirrors #793

Closed

4 tasks

Maxwell-Rosen added 11 commits February 18, 2026 13:59

Merge branch 'main' into gk-oap-2x-tandem-multispecies

fe0e3ed

Update variable name

1fb0516

GPU unit tests all pass. It's compute sanitizer clean. I will have to…

ce57b6a

… refactor this in the future, but it is just proof of concept for now to make sure it works correctly.

Merge branch 'main' into gk-oap-2x-tandem-multispecies

f3977bf

Fix loss cone mask unit test. Someone didn't run make check when comm…

909d741

…itting code to main and broke a unit test with the geometry enum changes

fdot multiplier works in parallel. Unit test fixed

c689992

Refactor global potential handling in damping modules to use shared p…

eed58bc

…hi_smooth_global array for improved performance and consistency across computations.

Fix missing semicolon in gk_species_damping_release function for prop…

0821a27

…er array release

remove the array_reduce_dir object. This is leftover from an old vers…

dec21d3

…ion of doing the kinetic electron and tandem mirror. The code is built again to make sure nothing is affected.

Maxwell-Rosen added the enhancement New feature or request label Feb 24, 2026

Maxwell-Rosen requested review from JunoRavin and manauref February 24, 2026 18:57

Maxwell-Rosen requested a review from tnbernard February 24, 2026 18:57

Maxwell-Rosen marked this pull request as ready for review February 24, 2026 18:57

Maxwell-Rosen added 2 commits February 24, 2026 14:42

Merge branch 'main' into gk-oap-2x-tandem-multispecies

a807e79

Uncrustify format the files this PR modifies and also depricate all p…

6a4931a

…=2 relevant code for the peak finders

Maxwell-Rosen added 14 commits March 5, 2026 13:23

Merge branch 'main' into gk-oap-2x-tandem-multispecies

34d6229

Merge branch 'main' into gk-oap-2x-tandem-multispecies

740cf5a

Fix an issue with the non-uniform grids in the 2x2v nonuniform wham r…

062d869

…egression test (and in my production simulations)

Merge branch 'main' into gk-oap-2x-tandem-multispecies

5eb57ae

Fix two egregious mistakes in using the gk_run methods for the POA sc…

d687d0f

…heme. There is no POA scheme in the run methods.

More dramatic potential for this regression test

2d39244

Add commented out calls to c2p_pos in loss_cone_mask_gyrokinetic kernels

724fb7f

Update the 1x2v boltz elc mirror poa to resemble the 2x2v case for de…

6ae3332

…bugging

Merge branch 'main' into gk-oap-2x-tandem-multispecies

153d7b2

Reduce the number of frames in the regression tests

6d95ed7

Add sprintf to the app name

443c76c

Remove c2p from the loss cone mask, as it shouldn't be there since ev…

122f3be

…erything is done in computational coordinates

Remove c2p context from damping

f8f799b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending the orbit average phase and pseudo orbit averaging to 2x, multispecies simulations, and tandem mirrors#934

Extending the orbit average phase and pseudo orbit averaging to 2x, multispecies simulations, and tandem mirrors#934
Maxwell-Rosen wants to merge 40 commits intomainfrom
gk-oap-2x-tandem-multispecies

Maxwell-Rosen commented Jan 26, 2026 •

edited

Loading

Uh oh!

Maxwell-Rosen commented Feb 24, 2026

Uh oh!

Maxwell-Rosen commented Mar 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Maxwell-Rosen commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Feature

Summary

Implementation Details

Community Standards

Testing:

Additional Notes

Uh oh!

Maxwell-Rosen commented Feb 24, 2026

Uh oh!

Maxwell-Rosen commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Maxwell-Rosen commented Jan 26, 2026 •

edited

Loading

Maxwell-Rosen commented Mar 17, 2026 •

edited

Loading