Revive runregression system for automated regression testing#961
Revive runregression system for automated regression testing#961ammarhakim merged 20 commits intomainfrom
Conversation
…ned to support the new layering of the system to create databases per layer (moments, vlasov, gyrokinetic, and pkpm) as well as the fact that we support both Lua and C regression tests. As part of my attempt to further improve the system, the run command now takes a <layer> input (so run moments check as a valid command which would only check the moments layer). I also added new flags --lua-only, --c-only, and a --timeout flag which exists determine if a test should be ignored because it's not really a regression test (if something is added that exceeds the timeout, it gets added to the ignoretests.lua at that layer). Still a lot of work to do, but this seems to be working. Note that, while Lua input file checking behaves as it did in G2, the C input file regression tests copy the files in creg/ into their respective directories along with the Makefile from the install directory's share/ that allows us to build C input files linking against the installed shared libraries. To facilitate this testing **it is now required to pass the --source-dir when running configure as part of the runregression setup**
…e reviving the regression system somewhat non-trivial, chief among them how the C inputs files determined what the name of their output files are. To facilitate a more stable and easier to maintain regression system, all C input files now figure out what the name of the output files should be **from the name of the executable**. There were also various inconsistencies in a number of tests, such as how not every gyrokinetic C file was using the gyrokinetic_run_simulation API. I did my best to standardize everything across different layers, though some tests right now are just being skipped (such as various hyperdg kernel timers and a couple of moment multi-block tests).
… critical issue: gyrokinetic tests did not know where the data/ directory was and thus many gyrokinetic tests cannot create their accepted results. Since the configure now includes the source directory anyway because of how we're running C regression tests (by copying .c files into their respective gkeyll-results/layer/test_name folders and compiling the executable there with the Makefile in the gkylsoft/gkeyll/share directory that allows these executables to link against the installed shared libraries), we just add a layer_src field that allows us to see what is in the data/ directory of any given layer and read those files if they are needed by a test. Also some additional cleanup to only edit the ignoretests.lua file if we detect a test should be ignored (for example, because it is not taking a long time). That particular bit of functionality may need to undergo further iteration. I just structured things this way for now because I wanted to automate adding tests to ignoretests.lua at each layer. There are a number of tests at all levels in the ignoretests.lua now across both C and Lua. Some tests immediately segfault (like the wham_3x2v input file that is supposed to read 2x2v initial conditions). Some tests just don't work for various development reasons (like rt_escreen_sr no longer have moments specified correctly after all this time). Long to do list going through all these tests and finding what we want to keep. Also need to move a number of Vlasov regression tests to proper unit tests like the vlasov moment test. Need to make sure all regression tests actually create apps.
…rt, and ZeroUtil uses in runregression.lua and to instead move towards a system where the various array.c array_rio.c, and rect_cart_grid.c structs and functions are directly wrapped with our Lua-C API, similar to what we do for rect_decomp, so that these methods can be called directly in Lua in runregression.lua. CompareFiles still seems to work (regression tests still pass). Fun off-by-one errors as always as one works through converting between C and Lua. Also added a few more tests to moments ignoretests.lua. Not sure what adjustments will be needed to always be testing the neural network solvers in production but we can iterate on this front.
… dynvec_diff_lw function in zero_lw.c and registering that function so that it is callable in the same way that array_diff is used in the runregression tool. As a first pass, the dynvec_diff_lw checks the compatibility of comparing the two files (are there the same number of time outputs?) and then uses the dynvec_to_array function to convert the dynvec output and time grid to gkyl_arrays that can then utilize the same gkyl_array_diff infrastructure used for the array comparison. Seems to work out-of-the-box for a number of regression tests, though some have inconsistencies run-to-run in their integrated diagnostics. Likely the more robust thing is to use this function as-is but go through every input file, Lua and C, and not use GKYL_MAX_INT for the number of output integrated diagnostic computations, but instead switch to calculating the integrated diagnostics at set times.
…cks the config.mak to see if the system is configured to build on GPUs (CC=nvcc) just as an initial pass at decoding how the system has been built (since on the Lua side we don't have GKYL_HAVE_CUDA). A bunch of additions to the database such as whether the system is testing GPUs and various GPU specific information (pass/fail, timing, whether or not the GPU test differs in output from the CPU test). By default GPU testing runs both CPU and GPU tests and compares against a single CPU-created baseline (so create never utilizes GPUs) and the CPU and GPU runs are compared against each other to give more information about different failure modes. GPU comparisons force us to be a bit more careful about absolute vs. relative tolerances and now I have the system failing *only if* both absolute and relative differences are above set tolerances (allowing GPU runs to pass even when a number if near 0 or very large and one of the two comparisons will always fail). The GPU testing also takes a tolerance input, which may be necessary in general for a variety of tests (right now running with 1.0e-7 by default which allows for large number of Vlasov GPU tests to pass). More work to do, especially going through each regression test and determining why certain GPU tests fail on my workstation. But seems to be doing at least mostly what we want
…o try to do a quick sanity check that someone is not messing with things like resolution when comparing created vs. current runs, but this will always fail when we include the git hash as metadata (because if the git hash is slightly different, it might make the output files different by a single byte).
…o utilize runregression system.
…"gkeyll" in more places
…iles. Since these files just contain integers denoting connectivity of different blocks, we can do a simple comparison without worrying about floating point precision for each connection. Add this to the runregression script and remove the Euler multiblock test from ignorelists.lua
|
I'm doing some testing on Perlmutter with a GPU build, and I have a few notes of feedback. I sent this to Jimmy on Slack and just wanted to make a formal GitHub comment.
I find this output confusing for several issues. For one, this printout makes it seem like the regression tests are run on GPU. In reality, these tests are being run on CPU. Furthermore, why is the GPU tolerance printed when nothing is being compared? Comparison should only happen in the
I dislike that the runregression has a submethod run, which further has submethods, so if you were to |
…ltiple tests simultaneously with their systems available CPU cores (so no parallelism within a test, but parallelism across tests). The structure here is that instead of the previous runLuaTest and runCTest structure, we now separate the system into a prepareLuaRun and prepareCRun, followed by an executeBatch function that uses shell background jobs to launch multiple simulations at once, before executing a blocking wait call for all jobs to finish and the single parent process to then handle the writing of data to the SQL database. This use of shell scripting generated by the Lua tool was already being utilized for the C regression tests to perform the sequence of tasks for copying the .c input files from their appropriate <layer>/creg directories, copying the Makefile from the installed gkylsoft/gkeyll/share/ directory, and compiling and running linked to the shared libraries (with various command line options like utilizing GPUs with -g) so we've just extended what the Lua tool does in terms of shell scripting. The choice to do this is because LuaJIT has no OS threads. Note that in this structure each test in parallel writes its own output text file in its run directory, and those files are what are then processed when all tests have been run (so that we can write to the SQL database serially). A few other notes about issues caught thus far in this work: 1. I am still discovering occasional issues where the C-side crashes, such as the case where apparently one of the regression tests was producing a dynvec of size 0 (like it only initializes data and does not take time steps). I have tried adding guards to size 0 output to just trivially pass and then test other things written out by that test, but more work is needed whether in general such a test *should* exist or if any test constructed this way is more suited for a specialized unit test. 2. Before we were walking through the run directories in the comparison stage, but I have now elected to switch to walking through the accepted directories in the comparison stage. My logic is that the accepted results are the ground truth; if the run directory has fewer or more files than the accepted results, that test should *automatically* fail because it's obviously not running what we consider the accepted output (either because the test changed or worse, the simulation itself crashes and doesn't even each the final output frame).
… comparisons, likely because of integration of certain geometric quantities (since we have not yet separated the spacetime from the state variables) is producing garbage that itself is non-deterministic what the exact value of garbage is.
|
Responding to some of @Maxwell-Rosen points above.
Following up on some things discussed with @JunoRavin privately, but I think he's onto these already:
My most important request: that we look into getting this running nightly and reporting the results to Github. Our sundials collaborators have this system running from a machine at Livermore I think, so we should just talk to them and see if we can copy what they did. |
|
@manauref, I'm not sure the name |
… improvements with respect to the prefix and run-only flags.
1. Try to add some more line breaks and formatting of comments. Right now I'm trying to make the comments relatively compact with line breaks at sentences where needed. When my terminal is my whole screen there are no awkward line breaks now, but there are still line breaks on a compact small terminal. 2. Finds the prefix from config.mak to try to reduce the headache of setting the prefix consistently 3. Allow for the use of just a test name with the --run-only command
… prefix and just show <layer>/accepted (or runs)/test_name/file_name. Also add a failedFiles table that is written to a text file and is printed to the screen via log() (it is not just printed with verbose option, since I assume if a test fails like this, users will want to know files that are failing immediately).
1. runregression.config.lua is not stored in gkeyll-results/ and we parse the config.mak for the prefix so that we can find this configuration file whenever we run (instead of always being loaded from its storage in the home directory). 2. Update comments for the fact that the source-dir is an absolute path to the gkeyll repo (I cannot find any other place we set the source directory. Every other environment variable path we set are various flavors of prefixes and install locations, not the location of the actual repository). 3. Updating GPU comments to note that create occurs on CPUs and to only provide the GPU tolerance screen message under certain conditions (check, and check without --no-gpu).
I have attempted to clarify that this is an absolute path to gkeyll
This is done now utilizing the --jobs flag to the run command (so run check --jobs 4 would run tests 4 at a time)
Done. It's stored in gkylsoft/gkeyll-results/ now
Done.
I'm not changing names or command chains like this right now. I personally am not a fan of any of the suggested changes. |
I can't figure out a way to automate the location of source-dir. We don't store the location of where the repository is anywhere as far as I can tell. We have environment variables for the install location so I was able to automate the prefix location, but if there is a way to automate where the gkeyll repo is located, I can't figure out what it is.
I don't want to change the names right now. I've been typing runregression run check enough that this is very comfortable for me and I feel like I can do this type of demonstration live for the OSSFE talk.
I think I have addressed these issues with the option to just run test names and I've tried to improve the comment formatting, though it may still need refinement.
On the agenda!
I think I have addressed this with the new writing of which files are failing to a Lua table (failedFiles) that is then written to a text file per run directory and also written to screen with log(). Let me know if there is more you want on this front. I also pruned the full path names which I think helps with the formatting. |
|
Based on a question from @Maxwell-Rosen I want to clarify how the parallel regression system works when running on GPUs. All GPU testing is serial. This limitation is because I am currently just leveraging system calls and letting the OS scheduler handle which tests go with which core through shell scripts the Lua tool writes (mixing & with a final wait call to block the completion of the batch of tests). Since I'm just using system calls, I don't (let's say yet) know how to tell the system "this test runs on GPU 0 and this test runs on GPU 1" efficiently, whereas leveraging OS calls means I never have to say "core 0 runs test A, core 1 runs test B, etc." So if you run the system on GPUs to check results, the following flow should occur if you use --jobs 4 with run check: [Batch-Lua] Launch 4 CPU tests simultaneously (&/wait) All collect calls are serial and sequential and paired with the runGpuVariant() function call for determining if we are also running the test on GPUs. |
…as I iterated with Claude on a few of the specific details of the refactor
|
Some more feedback after more use:
followed by a long list of failing files. |
…ssion system: 1. configure now requires --source-dir and tells user they must provide --source-dir (and it is the first flag described when doing configure -h) 2. I have separated ignoretests.lua into ignore_lua_tests.lua (in <layer>/luareg) and ignore_c_tests.lua (in <layer>/creg) and process these two ignore lists separately. I have also done the same thing for the moat.lua (separating into moat_lua.lua and moat_c.lua) and put a few tests at each layer in the mother-of-all-tests lua file so that this functionality is working. 3. I moved the description of --all flag to be a run flag, not a runregression flag, and have tested this overrides what is in ignore_lua_tests.lua and ignore_c_tests.lua. You can combined this flag with --c-only and --lua-only to override only what is the corresponding the ignore list. 4. I have added more information to the diff/missing and other flagged failure commands and have now further added the option to queryrdb to get specific information about a specific test similar to the --run-only flag gkeyll queryrdb --layer gyrokinetic query --id 1 --test rt_gk_sheath_2x2v_p1 which gets the most recent run and the information on the gk_sheath_2x2v_p1 test. You can also still pass a test ID (a number) if you choose but since you may not know which number corresponds to which test, hopefully this helps navigate the database.
| }; | ||
|
|
||
| // Set app output name from the executable name (argv[0]). | ||
| snprintf(app_inp.name, sizeof(app_inp.name), "%s", app_args.app_name); |
There was a problem hiding this comment.
why is changing the way sim name is specified necessary?
There was a problem hiding this comment.
It automates the way the runregression system determines what the name of the test is, instead of causing potential downstream errors if the .name is inconsistent with what the regression system expects (originally these C regression tests did not uniformly have the rt_* prefix and this was making the regression system checking C tests a pain). I documented this in the PR that this was a significant boon to parsing test names and eliminated inconsistencies between Lua and C test names when the tests were actually identical (since Lua gets the name of the output files just from the name of the script, whereas C input files were manually specifying this).
This only affects the C regression tests. .name is still a perfectly acceptable input for any C production test.
There was a problem hiding this comment.
oh there is something I don't like there actually, I cannot simply copy paste the regression test, compile it with gkylsoft/gkeyll/share/Makefile and run it using simply ./ command...
There was a problem hiding this comment.
It seg faults now because there are no default app name anymore. I think would be good to conserve that feature so that we can manually compile and run with ./ any regression test.
There was a problem hiding this comment.
You can do this. That is literally what the runregression system is doing. I am not understanding what you are doing that this does not work?
There was a problem hiding this comment.
ahoffman@ahoffman-lt gkeyll % mkdir ctest_tcv_2x2v.o
ahoffman@ahoffman-lt gkeyll % cd ctest_tcv_2x2v.o
ahoffman@ahoffman-lt ctest_tcv_2x2v.o % cp ../gyrokinetic/creg/rt_gk_tcv_iwl_adapt_source_2x2v_p1.c .
ahoffman@ahoffman-lt ctest_tcv_2x2v.o % cp ../../gkylsoft/gkeyll/share/Makefile .
ahoffman@ahoffman-lt ctest_tcv_2x2v.o % make
cc -O3 -g -ffast-math -I. -DGKYL_USING_FRAMEWORK_ACCELERATE -DGKYL_SHARE_DIR="/gkeyll/share" -DGKYL_HAVE_LUA -I/Users/ahoffman/gkylsoft/gkeyll/include -Icore -I/Users/ahoffman/gkylsoft/superlu/include/ -Izero -I/Users/ahoffman/gkylsoft/luajit/include/luajit-2.1/ -Izero -Izero rt_gk_tcv_iwl_adapt_source_2x2v_p1.c -o rt_gk_tcv_iwl_adapt_source_2x2v_p1 -L/Users/ahoffman/gkylsoft/gkeyll/lib -Wl,-rpath,/Users/ahoffman/gkylsoft/gkeyll/lib -lg0pkpm -lm -lpthread -L. -L/Users/ahoffman/gkylsoft/superlu/lib/ -L. -L/Users/ahoffman/gkylsoft/luajit/lib/ -L. -L. -framework Accelerate -lsuperlu -Wl,-rpath,/Users/ahoffman/gkylsoft/luajit/lib/ -lluajit-5.1 -lm -lpthread -ldl
ld: warning: ignoring duplicate libraries: '-lm', '-lpthread'
ahoffman@ahoffman-lt ctest_tcv_2x2v.o % ./rt_gk_tcv_iwl_adapt_source_2x2v_p1
zsh: segmentation fault ./rt_gk_tcv_iwl_adapt_source_2x2v_p1
There was a problem hiding this comment.
ok I tried with the 3x2v and it worked. Something else went wrong
ammarhakim
left a comment
There was a problem hiding this comment.
This is ready to merge. We can take a look at other changes later.
Given Jimmy's tutorial on Tuesday in Germany, I am dismissing this review for now. We can discuss more in a new branch, if needed.
| gkyl_gyrokinetic_app_cout(app, stdout, "Number of RK stage-3 failures %ld\n", stat.nstage_3_fail); | ||
| gkyl_gyrokinetic_app_cout(app, stdout, "Number of write calls %ld\n", stat.n_io); | ||
| gkyl_gyrokinetic_app_print_timings(app, stdout); | ||
| gkyl_gyrokinetic_run_simulation(&run_inp); |
There was a problem hiding this comment.
This is really really wrong and these changes should not be made. There is no run method for the POA scheme
There was a problem hiding this comment.
good catch Max. Please submit a PR fixing.

This PR serves as a baseline for the initial revival of the runregression Tool detailed in the design review here: #955
This tool was designed to permit a straightforward assessment of whether or not a given change or code modification modified the numerical output of a suite of simulations we have designed to stress test the whole simulation workflow. The basic idea:
gkeyllhas a commandrunregressionso if youmake installyou automatically gain access to this system.As part of this work a number of extensions over the previous regression system have been added with some assistance from Claude, especially for the SQL and Perl programming:
Many tests are currently in ignoretests.lua and I have not yet done anything in parallel (especially attempts to run multiple tests simultaneously and expedite testing in this way). But I want to get this looked over and reviewed now in preparation for the Open Source Software conference next week.
There is a major, if very small, change to C input files as part of this work: while .name is still something that can be passed to the input app struct, I now use the name of the executable to determine what the name of a regression test is by default to assist with file parsing when running C regression tests. That is why there are a large number of file changes in this PR.