WSI directory
Batch process
Failed job: CUDA OOM
Failed job: CUDA OOM
Apptainer Image: sayatmimar_compreps_multic-hpg_1.sif
Model
It is a batch process to perform MC segmentation comprising 40 virtual slides in the directory. After starting the batch process, noticed 7 jobs started running although it is a GPU job and I see arguments "--partition=gpu --gres=gres:gpu:a100:1 --cpus-per-task=8". Two jobs immediately failed because of CUDA OOM and the rest seems running. Although 7 jobs are in running mode but in closer inspection, I see that only 3 are actually performing segmentation (means they are leveraging GPUs) and the rest 4 are waiting for GPU availability. This makes sense as we have 3 GPU resources available in pinaki.sarder-dsa group. Also, single GPU jobs seem faster (~5 mins per slide) than what I used to observe in Pubcontainers though need better quantification.
WSI directory
Batch process
Failed job:
CUDA OOMFailed job:
CUDA OOMApptainer Image:
sayatmimar_compreps_multic-hpg_1.sifModel
It is a batch process to perform MC segmentation comprising 40 virtual slides in the directory. After starting the batch process, noticed 7 jobs started running although it is a GPU job and I see arguments "--partition=gpu --gres=gres:gpu:a100:1 --cpus-per-task=8". Two jobs immediately failed because of CUDA OOM and the rest seems running. Although 7 jobs are in running mode but in closer inspection, I see that only 3 are actually performing segmentation (means they are leveraging GPUs) and the rest 4 are waiting for GPU availability. This makes sense as we have 3 GPU resources available in pinaki.sarder-dsa group. Also, single GPU jobs seem faster (~5 mins per slide) than what I used to observe in Pubcontainers though need better quantification.