Skip to content

[ROCm] End-to-end inference verified on AMD MI300X #18

@ZJLi2013

Description

@ZJLi2013

GeometryForcing runs successfully on AMD MI300X GPUs with ROCm 6.4 — no CUDA-specific kernel changes needed.

Environment

Component Version
GPU 8x AMD Instinct MI300X (gfx942)
Docker rocm/pytorch:rocm6.4.3_ubuntu22.04_py3.10_pytorch_release_2.6.0
PyTorch 2.6.0 (ROCm)
Flash Attention AOTriton (via F.scaled_dot_product_attention)

What Works

  • All module imports (DFoTVideo, DFoTVideoPose, VGGT, U-ViT3D)
  • Checkpoint loading (geometry_forcing_state_dict.ckpt, 458.8M params)
  • Multi-GPU DDP inference (8 GPUs)
  • End-to-end video generation with realestate10k_mini (16 frames, 50 denoising steps)

Fixes

A few PyTorch 2.6 / torchmetrics v1.x compatibility fixes were needed (not ROCm-specific). Submitted as #17.

Results

Metric Value
Inference time ~69s per batch (16 frames, 50 steps)
Per-step speed ~1.0–1.4 it/s
End-to-end (incl. model loading) ~3.5 min
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions