Update Delft3D-FLOW submit script to slurm (AL8)#1007
Conversation
willem4
left a comment
There was a problem hiding this comment.
Thanks for the update of the scripts. I have not tested the scripts, as I trust you have. I have some minor suggestions.
| #SBATCH --job-name=delft3d4 # Specify a name for the job allocation. | ||
| #SBATCH --time=00:15:00 # Set a limit on the total run time of the job allocation. | ||
| #SBATCH --partition=4vcpu # Request a specific partition for the resource allocation. | ||
| # See: https://publicwiki.deltares.nl/display/Deltareken/Compute+nodes. |
There was a problem hiding this comment.
Do we want to include this in the run script? Or is it accessible for externals?
There was a problem hiding this comment.
shouldnt this (and the other files) receive the _AL8 as described in the text? Then it is clear that it is tailored for Alma8/Slurm
There was a problem hiding this comment.
Due to the number of different running script, can you add a comment line about when to use this sh-file file?
One line such as: "This is the master calling script to run ....."
There was a problem hiding this comment.
Removed the reference to the publicwiki page. It's not accessible outside Deltares. The AL8 slurm cloud isn't accessible either. Both scripts should work fine on other slurm systems ... except when using C-SUMO within MATLAB as that still includes a hardcoded path for the MATLAB installation.
I don't want to keep the _AL8 postfix. There is nothing in the script doesn't contain anything Alma Linux 8 specific. The primary dependencies are (1) module intelmpi/2021.11.0, (2) MATLAB in case of C-SUMO, obviously (3) Delft3D installation, and (4) slurm resource management.
A purpose statement added to the start of both scripts.
| debuglevel=-1 | ||
| runscript_extraopts= | ||
| NNODES=1 | ||
| wavefile=runwithoutwaveonlinebydefault |
There was a problem hiding this comment.
| wavefile=runwithoutwaveonlinebydefault | |
| # wavefile is set to "runwithoutwaveonlinebydefault". When adding the option -w, --wavefile <wname> in the run script a different file can be specified and this activates wave online. | |
| wavefile=runwithoutwaveonlinebydefault |
| fi | ||
|
|
||
| if [ $NSLOTS -eq 1 ]; then |
There was a problem hiding this comment.
Is the condition correct here. A single node, multi core also uses mpi right?
| numnode=1 | ||
| queue=normal-e3-c7 | ||
| runscript_extraopts= | ||
| wavefile=runwithoutwaveonlinebydefault |
| echo "-p, --PARTITION <PARTITION>" | ||
| echo " PARTITION, default $PARTITION" | ||
| echo " see also: https://publicwiki.deltares.nl/display/Deltareken/Compute+nodes" | ||
| echo "-t, --TIME_LIMIT <TIME_LIMIT>" |
There was a problem hiding this comment.
Is it useful to add the formatting of the time string?
| echo " number of nodes, default $NODES" | ||
| echo "-p, --PARTITION <PARTITION>" | ||
| echo " PARTITION, default $PARTITION" | ||
| echo " see also: https://publicwiki.deltares.nl/display/Deltareken/Compute+nodes" |
There was a problem hiding this comment.
Should this be included in external scripts?
| # Usage: | ||
| # | ||
| # This script runs Delft3D-FLOW in parallel on Linux | ||
| # This script runs Delft3D-FLOW in parallel on Linux H7 |
marcio-albernaz
left a comment
There was a problem hiding this comment.
Hi Bert,
Nice that you worked on it! ;)
I havent tested the scripts myself. But look good. Mainly comments to improve clarity
| #SBATCH --job-name=delft3d4 # Specify a name for the job allocation. | ||
| #SBATCH --time=00:15:00 # Set a limit on the total run time of the job allocation. | ||
| #SBATCH --partition=4vcpu # Request a specific partition for the resource allocation. | ||
| # See: https://publicwiki.deltares.nl/display/Deltareken/Compute+nodes. |
There was a problem hiding this comment.
shouldnt this (and the other files) receive the _AL8 as described in the text? Then it is clear that it is tailored for Alma8/Slurm
| echo "Usage: ${0##*/} [OPTION]..." | ||
| echo "This script is called by submit_dflow2d3d.sh." | ||
| echo "Usage: sbatch [SLURM OPTIONS]... ${0##*/} [OPTION]..." | ||
| echo "Run delft3d4 on H7." |
| corespernodedefault=1 | ||
| corespernode=$corespernodedefault | ||
| D3D_HOME= | ||
| NSLOTS=$SLURM_TASKS_PER_NODE |
There was a problem hiding this comment.
where is SLURM_TASKS_PER_NODE defined? Like Willem, I havent test it myself. But I don't see this variable being defined here
There was a problem hiding this comment.
SLURM_TASKS_PER_NODE is set by the slurm process. It should be equal to the TASKS_PER_NODE value in the submit script and passed to slurm by means of the -c (--corespernode) option.
Your comment made me realize that NSLOTS was set equal to SLURM_TASKS_PER_NODE. I have adjusted this to SLURM_NTASKS (or 1 if not set).
| export I_MPI_FABRICS=ofi | ||
| export FI_PROVIDER=tcp | ||
| export I_MPI_OFI_PROVIDER=tcp | ||
| export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK |
There was a problem hiding this comment.
I had a recent discussion with Adri about OMP_NUM_THREADS and OMP_NUM_THREADS_SWAN. Nowadays, for FM, Swan is controlled by the latter. Which one is correct for D3D-4? Good to double check it and perhaps make it consistent with OMP_NUM_THREADS_SWAN
There was a problem hiding this comment.
Indeed, OMP_NUM_THREADS for SWAN is set to OMP_NUM_THREADS_SWAN inside the swan.sh file (which is used to start the swan executable by both Delft3D-WAVE and D-Waves). I've now also set OMP_NUM_THREADS_SWAN ... the OMP_NUM_THREADS variable could be used by Delft3D-FLOW, but isn't used now (afaik).
| #SBATCH --job-name=delft3d4 # Specify a name for the job allocation. | ||
| #SBATCH --time=00:15:00 # Set a limit on the total run time of the job allocation. | ||
| #SBATCH --partition=4vcpu # Request a specific partition for the resource allocation. | ||
| # See: https://publicwiki.deltares.nl/display/Deltareken/Compute+nodes. |
There was a problem hiding this comment.
Due to the number of different running script, can you add a comment line about when to use this sh-file file?
One line such as: "This is the master calling script to run ....."
|
Thank you @marcio-albernaz and @willem4 for your comments. This pull request started as quick action for PK to get the |
|
Hi @hrajagers , I am fine to skip the phylosophical and cosmetic reviews. But I had at least one about the definition of a variable that may actually cause problems. I am fine if you restrict to check the technical points only. And maybe clarify if you actually tested it, which I am almost sure you did :) |
|
@marcio-albernaz Sorry to disappoint you. I didn't test it ... I just copied the files that have been unchanged for the past two years. So, either nobody is using them, or other users have tested them. Let's discuss next week in person (PK is running some tests now ... using a manual copy of the files). |
There was a problem hiding this comment.
If these files are renamed, then we also need to adjust
src\engines_gpl\flow2d3d\packages\flow2d3d\CMakeLists.txt
which refers to these files by full name for copying as part of the "install" step.
… the submit script doesn't have to know anything about the C-SUMO keywords ... updated the bash conditions updated the C-SUMO checking inside rd2d3d.sh ... the previous [ -n $csumoscript ] check only convers the compiled C-SUMO case
…//github.com/Deltares/Delft3D into d3d4/task/DELFT3D-37935_update_submit_script
… to copy the dedicated submit scripts anymore. Skip processing of RTC and WAVE flags in the submit script ... they are processed by the next rd2d3d script.
marcio-albernaz
left a comment
There was a problem hiding this comment.
Hi @bert, before approving it - can you say if these scripts have been tested?
The previous comments from Willem and I are also still open. You can close if they are not relevant anymore.
hrajagers
left a comment
There was a problem hiding this comment.
Comments processed. I did some very basic testing a while back, but I run a few more simulations using the latest scripts. Furthermore, I want to have a green test bench before approval. Since these scripts are not used in the testbench, all failures should be unrelated to these changes ... but a green test bench also doesn't give any guarantee that these scripts are correct.
| #SBATCH --job-name=delft3d4 # Specify a name for the job allocation. | ||
| #SBATCH --time=00:15:00 # Set a limit on the total run time of the job allocation. | ||
| #SBATCH --partition=4vcpu # Request a specific partition for the resource allocation. | ||
| # See: https://publicwiki.deltares.nl/display/Deltareken/Compute+nodes. |
There was a problem hiding this comment.
Removed the reference to the publicwiki page. It's not accessible outside Deltares. The AL8 slurm cloud isn't accessible either. Both scripts should work fine on other slurm systems ... except when using C-SUMO within MATLAB as that still includes a hardcoded path for the MATLAB installation.
I don't want to keep the _AL8 postfix. There is nothing in the script doesn't contain anything Alma Linux 8 specific. The primary dependencies are (1) module intelmpi/2021.11.0, (2) MATLAB in case of C-SUMO, obviously (3) Delft3D installation, and (4) slurm resource management.
A purpose statement added to the start of both scripts.
| corespernodedefault=1 | ||
| corespernode=$corespernodedefault | ||
| D3D_HOME= | ||
| NSLOTS=$SLURM_TASKS_PER_NODE |
There was a problem hiding this comment.
SLURM_TASKS_PER_NODE is set by the slurm process. It should be equal to the TASKS_PER_NODE value in the submit script and passed to slurm by means of the -c (--corespernode) option.
Your comment made me realize that NSLOTS was set equal to SLURM_TASKS_PER_NODE. I have adjusted this to SLURM_NTASKS (or 1 if not set).
| export I_MPI_FABRICS=ofi | ||
| export FI_PROVIDER=tcp | ||
| export I_MPI_OFI_PROVIDER=tcp | ||
| export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK |
There was a problem hiding this comment.
Indeed, OMP_NUM_THREADS for SWAN is set to OMP_NUM_THREADS_SWAN inside the swan.sh file (which is used to start the swan executable by both Delft3D-WAVE and D-Waves). I've now also set OMP_NUM_THREADS_SWAN ... the OMP_NUM_THREADS variable could be used by Delft3D-FLOW, but isn't used now (afaik).
| echo "-p, --PARTITION <PARTITION>" | ||
| echo " PARTITION, default $PARTITION" | ||
| echo " see also: https://publicwiki.deltares.nl/display/Deltareken/Compute+nodes" | ||
| echo "-t, --TIME_LIMIT <TIME_LIMIT>" |
marcio-albernaz
left a comment
There was a problem hiding this comment.
Indeed, here testing the scripts is important because it won't be tested in the tesbench (as far as I know).
Happy to approve it and see what happens :)
What was done
Update Delft3D-FLOW submit script to slurm consistent with our Alma 8 cluster.
Replaced the content of
submit_dflow2d3d.shandrd2d3d.shscripts by that of the_AL8versions that some internal DIMRsets included. Replaced instead of including additional files since the previous H6 system is no longer available anyway.Evidence of the work done
<add video/figures if applicable>
Tests
<add testcase numbers if applicable, Issue number>
Documentation
<add description of changes if applicable, Issue number>
Issue link
Closes DELFT3D-37935