Extending the orbit average phase and pseudo orbit averaging to 2x, multispecies simulations, and tandem mirrors#934
Extending the orbit average phase and pseudo orbit averaging to 2x, multispecies simulations, and tandem mirrors#934Maxwell-Rosen wants to merge 40 commits intomainfrom
Conversation
…e maximum magnetic field is determined. Bmag max is stored as a gkyl_array. Right now, we only do this for bmag, but we need to store phi as a 1d maximum array. I haven't decided on the final design for how 2x OAP simulations should be accomodated. Perhpas we need some general gkyl_dg_array_reduce methods that take a 2D array and turn it into a 1D array instead of a 0D number. This is a kind of reduction method, but it's not a total reduction. I tested the regression test included for a 2x2v boltzmann mirror and the output of the magnetic field looks correct. The current implementation evaluates bmag at cell corners, but we ideally should do the corners in Z, but the quadrature nodes in psi.
- Introduced a new header file `gkyl_array_dg_find_peaks.h` that defines a structure and functions for finding peaks (local maxima, minima, and boundary values) in DG fields. - Implemented an internal structure in `gkyl_array_dg_find_peaks_priv.h` to manage peak finding operations, including storage for peak values and coordinates. - Removed unused initialization and writing of `bmag_max` arrays in `gyrokinetic.c` to streamline the geometry setup process. - Deleted the `gkyl_gk_geometry_bmag_max_init` and `gkyl_gk_geometry_bmag_max_release` functions from `gk_geometry.c` as they are no longer needed, simplifying the geometry management.
…es another DG array at the peaks of the initilized array. Add appropriate unit tests which pass to ctest_array_dg_find_peaks. Update gk_species_fdot_multiplier to use the project_on_peaks function with phi. Now, everything passed to loss_cone_mask_gyrokinetic is a gkyl_array. The loss_cone_mask is updated accordingly
…ount of compution we need for evaluating phi at its peak in the app. Unit tests pass. Regression tests look fine as well. They're all valgrind clean. I think the right way to do the paralellism is to do the peak finding on a global bmag, just like how it is done for the position_map, then when we evaluate phi, all processes evaluate it at this peak, however only one will return a true value. This process will broadcast the array to the rest of the processes
…on at the peak locations
… method, which is just like find_peaks, but it computes the global maximum or minimum.
…nd regression tests are brought over from another branch. Unit tests for the array mask, loss cone mask, and the regression tests for the kinetic electron POA mirror are valgrind free
…grind clean. Regression test is added and produces reasonable results.
…formance - Introduced a helper function `mkarr` to streamline array allocation for GPU and CPU. - Removed the `gkyl_loss_cone_mask_gyrokinetic_Dbmag_quad_wall` function and integrated its logic into the main processing flow. - Updated the GPU kernel `gkyl_loss_cone_mask_gyrokinetic_Dbmag_quad_cu_ker` to compute `Dbmag_quad` directly from `bmag_peak` instead of `bmag_max`. - Enhanced tandem mirror support by adding handling for `bmag_peak` and `phi_m` in the GPU kernels. - Simplified the logic for determining trapped particles in the `gkyl_loss_cone_mask_gyrokinetic_ker` and `gkyl_loss_cone_mask_gyrokinetic_quad_ker` functions. - Improved readability and maintainability by restructuring conditional checks and variable assignments.
…plier. Add possibility of kinetic electrons and tandem mirrors. The damping regression test is failing, both here and on main. They are for different issues. Main fails because the loss_cone updater has an issue. Here, it fails because it's using scale_by_cell with a multi-component array. I'm not sure the right way to fix this
…ove the aspects about the cellwise evaluation and quadrature points because that breaks the array_scale_by_cell method which is used
… refactor this in the future, but it is just proof of concept for now to make sure it works correctly.
… arrays need to be passed to objects like the loss_cone_mask, where it expects these to be GPU arrays. It's just easier to have this module fit the archatecture of the rest of the code, rather than doing something different and copying between device and host. It wouldn't interface well. Claude generated most of the cuda code, with strong guidence from Maxwell
…h compute sanitizer with the array_find_peaks which was causing crashes in the loss cone mask. These issues are fixed. There was some funny business regarding the basis being on the host vs device. Refactor the allocations in the GPU kernels to not be inside the kernels. Instead, it's allocated at init time. The GPU code pulses, which is odd, but it runs. The 2x2v POA regression test runs and is compute sanitizer clean. The other POA tests do not error either on GPU and are compute sanitizer clean.
…itting code to main and broke a unit test with the geometry enum changes
…hi_smooth_global array for improved performance and consistency across computations.
…ion of doing the kinetic electron and tandem mirror. The code is built again to make sure nothing is affected.
…=2 relevant code for the peak finders
|
@JunoRavin and I have expressed some concerns over the GPU implementation of the find_peaks methods. It's a bit silly to compute here on the GPU, since we're nowhere close to saturating it. However, from a memory management standpoint, having everything on the GPU is simpler. The memory management organization is why I decided on the current implementation. To ensure that we are not wasting time, I have timed the Even inside this advance_loss_cone_mult accumulated timings over 621 calls (s):
allgather : 0.00580832
find_peaks (max) : 0.00985657
loss_cone_multiplier_advance : 0.0727482
scale_by_cell : 0.00640832
total : 0.104605In total, this simulation took 18 seconds and 0.02 seconds were spent on the |
…egression test (and in my production simulations)
…heme. There is no POA scheme in the run methods.
…nto the loss_cone_mask_advance, but the itterators are local, so they were itterating over the local ranges, which was offsetting phi when it is evaluated. It should instead be using the local phi. Furthermore, I cleaned up the unit tests a little. I also realize that we should not be calling up->c2p_pos(xmu, xmu, up->c2p_pos_ctx);. This is because the spatial coordinates are only used to find if we are beyond the mirror throat and in the expander. The mirror throat is evaluated in computational coordinates, so we should compare computational to computational coordinates. I'm not sure if this is a mistake on main because in this branch, I re-did how the mirror throat is found using the find_peaks operator.
…erything is done in computational coordinates



Feature
Summary
Purpose: The orbit average phase and pseudo orbit average scheme were first implemented for single species simulations. This must be generalized to kinetic electrons simulations, 2x, and tandem mirrors. This PR adds these capabilities
fixes #793
Implementation Details
Key changes:
gk_species_fdot_multiplier.candgk_species_damping.care modified to use anarray_dg_find_peaksobject. This is necessary to identify the locations of the walls, the tandem mirror peak, and the inner mirror peak. Critically, the maximum bmag and phi were previously defined at a single point, the mirror throat. Now, Bmag and phi are arrays in 2x, or a single-component array in 1x. These must be global arrays because peak finding is a global operation (each process must see global bmag and phi to find phi at the peaks of bmag).array_dg_find_peaks.cis introduced as a new method inside core. This takes any DG array and finds the peaks. Currently, it is limited to 2D and 1D arrays, but this would be very complicated in 3D. It takes your 2D array and constructs N*1D arrays for the peaks. We can then project another 2D array onto the peaks of the initial 2D array. Comprehensive unit tests are added for this core method. GPU unit tests are included. The GPU implementation is a bit redundant since peak finding is recursive (and done on CPU) and the projection would be faster on CPU since it wouldn't have to launch a CUDA kernel, but Gkeyll is constructed in such a way that memory must lie on the GPU when running GPU code, so it's difficult to have this object always live on host and copy the neccisary information over. I tried this and it had difficulty interfacing with the loss cone mask module.loss_cone_mask_gyrokineticis adjusted for the changes necessary for 2x mirrors, tandem, and kinetic electrons. This meant a big overhaul of this updater so it's working with arrays, as well as extra arguments for tandem and kinetic electrons.Dependencies:
core/zero/array_dg_find_peaks.cAutomated testing: four regression tests are added
gk_mirror_boltz_elc_poa_2x2v_p1gk_mirror_kinetic_elc_poa_1x2v_p1gk_mirror_tandem_boltz_elc_poa_1x2vgk_wham_kinetic_poa_1x2v_p1LLM details: I relied heavily on LLMs for this PR. The find_peaks method is almost entirely written by LLMs, given a few examples. It's thoroughly tested to machine precision in the unit test and performs well.
Community Standards
layer/zeroshould have a unit test, e.g.,core/zero.Testing:
make checkand unit tests all pass.Additional Notes
Here is some evidence that the mask works correctly for the CFL condition and is different for the electrons and ions.
When comparing the 1x2v test of
gk_wham_poa_1x2v_p1, it reproduces the exact same mask, even in parallel on GPU. All tests here are over 4 GPUs on perlmutterThe tandem mirror loss mask is reproduced well.
pgkyl gk_mirror_tandem_boltz_elc_poa_1x2v-bmag.gkyl interp pl --title "|B| for tandem mirror" --xlabel "Z"pgkyl gk_mirror_tandem_boltz_elc_poa_1x2v-ion_fdot_multiplier_10.gkyl sel --z2 12 pl --xlabel "Z" --ylabel "computational vpar" --title "Loss cone mask of tandem mirror"--c2p-vel does not work for this figure because it is c0.
Compute sanitizer clean on perlmutter
compute-sanitizer --tool memcheck --leak-check full ./cuda-build/gyrokinetic/creg/rt_gk_mirror_boltz_elc_poa_2x2v_p1 -g -s1
========= COMPUTE-SANITIZER
Running phase 0 @ t = 0.000000000e+00 ...
Taking time-step 1 at t = 0 ... dt = 0.000132681
Number of update calls 1
Number of forward-Euler calls 3
Number of RK stage-2 failures 0
Number of RK stage-3 failures 0
Number of write calls 9
Timing: [removed]
========= LEAK SUMMARY: 0 bytes leaked in 0 allocations
========= ERROR SUMMARY: 0 errors
rt_gk_mirror_boltz_elc_damped_1x2v_p1runs as usual on 4 GPU.This produces reasonable results for the 2x2v mirror


pgkyl gk_mirror_boltz_elc_poa_2x2v_p1-ion_fdot_multiplier_2.gkyl sel --z1 0.0 --z3 14 pl --xlabel "psi" --ylabel "vpar"pgkyl gk_mirror_boltz_elc_poa_2x2v_p1-ion_fdot_multiplier_2.gkyl sel --z0 0 --z3 14 pl --xlabel "Z" --ylabel "vpar"