[ROCm] End-to-end inference verified on AMD MI300X

GeometryForcing runs successfully on AMD MI300X GPUs with ROCm 6.4 — no CUDA-specific kernel changes needed.

## Environment

| Component | Version |
|-----------|---------|
| GPU | 8x AMD Instinct MI300X (gfx942) |
| Docker | `rocm/pytorch:rocm6.4.3_ubuntu22.04_py3.10_pytorch_release_2.6.0` |
| PyTorch | 2.6.0 (ROCm) |
| Flash Attention | AOTriton (via `F.scaled_dot_product_attention`) |

## What Works

- All module imports (DFoTVideo, DFoTVideoPose, VGGT, U-ViT3D)
- Checkpoint loading (`geometry_forcing_state_dict.ckpt`, 458.8M params)
- Multi-GPU DDP inference (8 GPUs)
- End-to-end video generation with `realestate10k_mini` (16 frames, 50 denoising steps)

## Fixes

A few PyTorch 2.6 / torchmetrics v1.x compatibility fixes were needed (not ROCm-specific). Submitted as #17.

## Results

| Metric | Value |
|--------|-------|
| Inference time | ~69s per batch (16 frames, 50 steps) |
| Per-step speed | ~1.0–1.4 it/s |
| End-to-end (incl. model loading) | ~3.5 min |

<img width="512" height="256" alt="Image" src="https://github.com/user-attachments/assets/ee0d7556-e7d3-4f30-98ff-f355431aeea3" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] End-to-end inference verified on AMD MI300X #18

Environment

What Works

Fixes

Results

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Component	Version
GPU	8x AMD Instinct MI300X (gfx942)
Docker	`rocm/pytorch:rocm6.4.3_ubuntu22.04_py3.10_pytorch_release_2.6.0`
PyTorch	2.6.0 (ROCm)
Flash Attention	AOTriton (via `F.scaled_dot_product_attention`)

Metric	Value
Inference time	~69s per batch (16 frames, 50 steps)
Per-step speed	~1.0–1.4 it/s
End-to-end (incl. model loading)	~3.5 min

[ROCm] End-to-end inference verified on AMD MI300X #18

Description

Environment

What Works

Fixes

Results

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions