Skip to content

Batch process Multi Compartment Segmentation #3

@anindya-paul

Description

@anindya-paul

WSI directory
Batch process
Failed job: CUDA OOM
Failed job: CUDA OOM
Apptainer Image: sayatmimar_compreps_multic-hpg_1.sif
Model

It is a batch process to perform MC segmentation comprising 40 virtual slides in the directory. After starting the batch process, noticed 7 jobs started running although it is a GPU job and I see arguments "--partition=gpu --gres=gres:gpu:a100:1 --cpus-per-task=8". Two jobs immediately failed because of CUDA OOM and the rest seems running. Although 7 jobs are in running mode but in closer inspection, I see that only 3 are actually performing segmentation (means they are leveraging GPUs) and the rest 4 are waiting for GPU availability. This makes sense as we have 3 GPU resources available in pinaki.sarder-dsa group. Also, single GPU jobs seem faster (~5 mins per slide) than what I used to observe in Pubcontainers though need better quantification.

Metadata

Metadata

Assignees

Labels

hpcc-allocationIssue with HPCC allocation - Not related to app/plugin

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions