-
Notifications
You must be signed in to change notification settings - Fork 1
Tutorial 09
Usually, you need to keep the connection to the cluster open to run a script. But advanced codes often take hoursโor even daysโto complete. ๐ฉ A dropped VPN or Wi-Fi connection can cause your code to stop, forcing you to start over.
Good news! ๐ You can keep your scripts running in the background using tmux.
TMUX stands for "terminal multiplexer"โit lets you open multiple terminal windows inside a single SSH session.
tmuxtmux new -s mynamePress Ctrl+B, then c
-
Ctrl+Bthennโ next window -
Ctrl+Bthenpโ previous window -
Ctrl+Bthen0โ9โ numbered windows
Press Ctrl+B, then d
tmux a -t mynametmux ls๐ More info: tmuxcheatsheet.com
For long or resource-intensive jobs, use SLURM ๐ง to manage resources. It queues jobs, assigns compute nodes, and prevents collisions between users.
#!/bin/bash
#
# filename: slurm_script
#SBATCH -p leeburton-pool
#SBATCH --account=power-leeburton-users_v2
#SBATCH --job-name=vasprun
#SBATCH --time=24:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=40GB
ulimit -s 81920
# job
python filename.py > output_file.out๐ Replace filename.py with your script. Save as job.script, then run:
sbatch job.script๐ The output appears in output_file.out.
-
--partition๐งฉ: Which queue partition to use -
--account๐ฅ: The shared user account -
--job-name๐ท๏ธ: A readable job name -
--timeโฒ๏ธ: Max run time (up to 10 days) -
--nodes๐ฅ๏ธ: Number of compute nodes -
--ntasks๐ข: Number of tasks (e.g., cores) -
--mem๐ง : Memory per node
-
--output: Save stdout -
--error: Save stderr -
--cpus-per-task: For multi-threaded jobs -
--mail-type: Get notified (BEGIN, END, FAIL, ALL) -
--mail-user: Email for notifications -
--dependency: Chain jobs (e.g.,afterok:12345)
#!/bin/bash
#
# filename: slurm_script
#SBATCH -p leeburton-pool
#SBATCH --account=power-leeburton-users_v2
#SBATCH --job-name=vasprun
#SBATCH --time=72:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=24
#SBATCH --mem=120GB
ulimit -s 81920
# job
module load intel/rocky8-oneAPI-2023
module load vasp/rocky8-intel-6.4.1
mpirun -n $SLURM_NTASKS vasp_std > output๐งฑ For large systems (~1000 atoms):
#!/bin/bash
#
#SBATCH -p leeburton-pool
#SBATCH --account=power-leeburton-users_v2
#SBATCH -J bigslab
#SBATCH --time=99:00:00
#SBATCH --nodes=2
#SBATCH --ntasks=96
#SBATCH --mem=920GB
ulimit -s 81920
# job
module load intel/rocky8-oneAPI-2023
module load vasp/rocky8-intel-6.4.1
mpirun -n $SLURM_NTASKS vasp_std > outputPOWER has 4 GPUs. You can:
- Use 1 full GPU (max 4 parallel jobs)
- Or use MIGs (Multi-Instance GPUs) to run more jobs ๐คฏ
More info: NVIDIA MIG Guide
#!/bin/bash
#
# filename: slurm_script
#SBATCH -p gpu-leeburton-pool
#SBATCH --account=power-leeburton-users_v2
#SBATCH --job-name=vasprun
#SBATCH --time=04:00:00
#SBATCH --ntasks=1
##SBATCH --gres=gpu:1 # Full GPU (2 available)
##SBATCH --gres=gpu:1g.10gb:1 # One MIG (14 available)
#SBATCH --cpus-per-task=2
#SBATCH --mem=16GB
ulimit -s 81920
# Job setup
module purge
module load vasp/vasp.6.5.1-hpc_sdk
echo $CUDA_VISIBLE_DEVICES
mpirun -n $SLURM_NTASKS vasp_std > outputPick one and uncomment the right line:
โ To use a MIG:
#SBATCH --gres=gpu:1g.10gb:1โ To use a full GPU:
#SBATCH --gres=gpu:1Comment out the other with #.
๐ Congratulations! You now know how to work with the POWER cluster.
Continue to Tutorial 10 ๐