Skip to content

[enroot] expose_enroot_logs breaks --pty argument in srun #413

@itechdima

Description

@itechdima

the process is stuck

srun -G8 --container-image=ghcr.io#coreweave/nccl-tests:12.4.1-cudnn-devel-ubuntu22.04-nccl2.23.4-1-2ff05b2 --pty bash
cpu-bind=MASK - worker-0, task  0  0 [1880283]: mask 0xffffffffffffffffffffffffffffffff set
[INFO] Extracting squashfs filesystem...

Parallel unsquashfs: Using 128 processors
22590 inodes (65420 blocks) to write

[=================================================================================================================================================================================================|] 65420/65420 100%

created 20567 files
created 2836 directories
created 2017 symlinks
created 0 devices
created 0 fifos
created 0 sockets

if we disable expose_enroot_logs, then it works fine:

srun -G8 --container-image=ghcr.io#coreweave/nccl-tests:12.4.1-cudnn-devel-ubuntu22.04-nccl2.23.4-1-2ff05b2 --pty bash
cpu-bind=MASK - worker-0, task  0  0 [2660]: mask 0xffffffffffffffffffffffffffffffff set
amialiusik@worker-0:/opt/nccl-tests$ 

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions