First connect to get cluster and pull the newest changes
ssh <user>@login.leonhard.ethz.ch
cd <project>
git pullThen run this script locally
./start_jupyter_nb.sh LeoOpen alelidis 8 01:20 4096 1Then connect tensorboard.
ssh <user>@login.leonhard.ethz.ch -L localhost:17605:localhost:17605
cd DeepExplain/experiments/logs
module load python_gpu/3.6.4
tensorboard --logdir ./adv --host "0.0.0.0" --port 17605When done hit logout in notebook
Connect to ETH through VPN, then you can login to the cluster using you NETHZ username and password. First, upload the data and code files to the cluster
scp -r -v <data and code folder> <user>@login.leonhard.ethz.ch:data
Make sure to rename the data folder to "data" for the code to run correctly.
Then login to the cluster using ssh and load the required modules
ssh <user>@login.leonhard.ethz.ch
module load python_gpu/3.6.1 hdf5/1.10.1
Run the preProcessor.py to generate the required numpy files in "data/preprocessingOut/"
bsub -n 8 -R "[mem=8000]" python preProcessor.py
Then use the batch submission system to submit the job
bsub -W 4:00 -n 8 -R "rusage[mem=8000,ngpus_excl_p=1]" python <file name>.py
Options:
W: maximum time for the job in the format of HH:MM (4 hours in the example above)n: number of CPU cores to use (8 in the example above)R: resources to use: -memmemory (RAM) in mega bytes per CPU core (8 cores * 8000 MB = 64GB) -ngpus_excl_pnumber of GPUs to use -I: interactive mode, this directs the output to
The regular GPU models used are NVIDIA GTX 1080 with approximately 10GB of dedicated memory.
To show the stats of all the jobs you have in queue or running after submission:
bbjobs
Or, for less information:
bjobs
To check a jobs output (a job which is not in interactive mode)
bpeek <job ID>
After the job finishes execution, it will generate a file in your home directory with the output of the job called lsf.<job ID>
To terminate a job
bkill <job ID>
- Memory: 160 GB
- Cores: 24 Cores
- GPUs: 1 GPU
- Run time: 120 hours
Click here for more information about the batch system and clusters.