Tutorial 09

🖥️ Working on the TAU Supercomputer

🧵 Running long scripts on the POWER cluster (with TMUX)

Usually, you need to keep the connection to the cluster open to run a script. But advanced codes often take hours—or even days—to complete. 😩 A dropped VPN or Wi-Fi connection can cause your code to stop, forcing you to start over.

Good news! 🎉 You can keep your scripts running in the background using tmux.

TMUX stands for "terminal multiplexer"—it lets you open multiple terminal windows inside a single SSH session.

🟢 Start a new session:

tmux

✏️ Start a named session:

tmux new -s myname

➕ Create a new window:

Press Ctrl+B, then c

🔁 Switch between windows:

Ctrl+B then n → next window
Ctrl+B then p → previous window
Ctrl+B then 0–9 → numbered windows

📴 Detach from the session:

Press Ctrl+B, then d

🔙 Reattach to a session:

tmux a -t myname

📋 List running sessions:

tmux ls

🔗 More info: tmuxcheatsheet.com

📅 The Queue System on the POWER Cluster (SLURM)

For long or resource-intensive jobs, use SLURM 🧠 to manage resources. It queues jobs, assigns compute nodes, and prevents collisions between users.

🚀 Submitting a Python job

#!/bin/bash
#
# filename: slurm_script
#SBATCH -p leeburton-pool
#SBATCH --account=power-leeburton-users_v2
#SBATCH --job-name=vasprun
#SBATCH --time=24:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=40GB

ulimit -s 81920

# job
python filename.py > output_file.out

📌 Replace filename.py with your script. Save as job.script, then run:

sbatch job.script

📝 The output appears in output_file.out.

⚙️ SLURM Directives Explained

--partition 🧩: Which queue partition to use
--account 👥: The shared user account
--job-name 🏷️: A readable job name
--time ⏲️: Max run time (up to 10 days)
--nodes 🖥️: Number of compute nodes
--ntasks 🔢: Number of tasks (e.g., cores)
--mem 🧠: Memory per node

✨ Common Extras

--output: Save stdout
--error: Save stderr
--cpus-per-task: For multi-threaded jobs
--mail-type: Get notified (BEGIN, END, FAIL, ALL)
--mail-user: Email for notifications
--dependency: Chain jobs (e.g., afterok:12345)

⚛️ Running VASP on POWER with SLURM

🖥️ CPU Version

#!/bin/bash
#
# filename: slurm_script
#SBATCH -p leeburton-pool
#SBATCH --account=power-leeburton-users_v2
#SBATCH --job-name=vasprun
#SBATCH --time=72:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=24
#SBATCH --mem=120GB

ulimit -s 81920

# job
module load intel/rocky8-oneAPI-2023
module load vasp/rocky8-intel-6.4.1

mpirun -n $SLURM_NTASKS vasp_std > output

🧱 For large systems (~1000 atoms):

#!/bin/bash
#
#SBATCH -p leeburton-pool
#SBATCH --account=power-leeburton-users_v2
#SBATCH -J bigslab
#SBATCH --time=99:00:00
#SBATCH --nodes=2
#SBATCH --ntasks=96
#SBATCH --mem=920GB

ulimit -s 81920

# job
module load intel/rocky8-oneAPI-2023
module load vasp/rocky8-intel-6.4.1

mpirun -n $SLURM_NTASKS vasp_std > output

🎮 GPU Version (and MIGs)

POWER has 4 GPUs. You can:

Use 1 full GPU (max 4 parallel jobs)
Or use MIGs (Multi-Instance GPUs) to run more jobs 🤯

⚠️ Only 2 GPUs are configured for MIGs (each split into 7 MIGs, each with 10GB).

More info: NVIDIA MIG Guide

⚙️ SLURM script for GPU VASP

#!/bin/bash
#
# filename: slurm_script
#SBATCH -p gpu-leeburton-pool
#SBATCH --account=power-leeburton-users_v2
#SBATCH --job-name=vasprun
#SBATCH --time=04:00:00
#SBATCH --ntasks=1
##SBATCH --gres=gpu:1                  # Full GPU (2 available)
##SBATCH --gres=gpu:1g.10gb:1          # One MIG (14 available)
#SBATCH --cpus-per-task=2
#SBATCH --mem=16GB

ulimit -s 81920

# Job setup
module purge
module load vasp/vasp.6.5.1-hpc_sdk
echo $CUDA_VISIBLE_DEVICES

mpirun -n $SLURM_NTASKS vasp_std > output

🔀 MIG or Full GPU?

Pick one and uncomment the right line:

✅ To use a MIG:

#SBATCH --gres=gpu:1g.10gb:1

✅ To use a full GPU:

#SBATCH --gres=gpu:1

Comment out the other with #.

✅ Next Steps

🎉 Congratulations! You now know how to work with the POWER cluster.

Continue to Tutorial 10 🚀

Tutorial 09

🖥️ Working on the TAU Supercomputer

🧵 Running long scripts on the POWER cluster (with TMUX)

🟢 Start a new session:

✏️ Start a named session:

➕ Create a new window:

🔁 Switch between windows:

📴 Detach from the session:

🔙 Reattach to a session:

📋 List running sessions:

📅 The Queue System on the POWER Cluster (SLURM)

🚀 Submitting a Python job

⚙️ SLURM Directives Explained

✨ Common Extras

⚛️ Running VASP on POWER with SLURM

🖥️ CPU Version

🎮 GPU Version (and MIGs)

⚙️ SLURM script for GPU VASP

🔀 MIG or Full GPU?

✅ Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally