-
Notifications
You must be signed in to change notification settings - Fork 54
Closed
Labels
Milestone
Description
the process is stuck
srun -G8 --container-image=ghcr.io#coreweave/nccl-tests:12.4.1-cudnn-devel-ubuntu22.04-nccl2.23.4-1-2ff05b2 --pty bash
cpu-bind=MASK - worker-0, task 0 0 [1880283]: mask 0xffffffffffffffffffffffffffffffff set
[INFO] Extracting squashfs filesystem...
Parallel unsquashfs: Using 128 processors
22590 inodes (65420 blocks) to write
[=================================================================================================================================================================================================|] 65420/65420 100%
created 20567 files
created 2836 directories
created 2017 symlinks
created 0 devices
created 0 fifos
created 0 sockets
if we disable expose_enroot_logs, then it works fine:
srun -G8 --container-image=ghcr.io#coreweave/nccl-tests:12.4.1-cudnn-devel-ubuntu22.04-nccl2.23.4-1-2ff05b2 --pty bash
cpu-bind=MASK - worker-0, task 0 0 [2660]: mask 0xffffffffffffffffffffffffffffffff set
amialiusik@worker-0:/opt/nccl-tests$
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Done