Skip to content

backend_gpu.py crashes on AMD ROCm systems due to hardcoded nvidia-smi call #196

@adelejjeh

Description

@adelejjeh

Description

BackendGPU.get_backend_info() in src/xpu_perf/micro_perf/backends/GPU/backend_gpu.py calls subprocess.run(['nvidia-smi', ...]) unconditionally. On AMD ROCm systems where nvidia-smi is not present, this raises an unhandled FileNotFoundError and crashes the benchmark before any workloads run.

Steps to Reproduce

  1. Run xpu-perf on a system with AMD GPUs using ROCm 7.2.2 (PyTorch with torch.version.hip set)
  2. Launch any GPU benchmark: python projects/micro_perf/launch.py --backend GPU --device 0
  3. Crash occurs in get_backend_info():
    FileNotFoundError: [Errno 2] No such file or directory: 'nvidia-smi'
    

Suggested Fix

Detect the platform and use rocm-smi on AMD ROCm systems to retrieve driver version information, rather than unconditionally calling nvidia-smi.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions