GeometryForcing runs successfully on AMD MI300X GPUs with ROCm 6.4 — no CUDA-specific kernel changes needed.
Environment
| Component |
Version |
| GPU |
8x AMD Instinct MI300X (gfx942) |
| Docker |
rocm/pytorch:rocm6.4.3_ubuntu22.04_py3.10_pytorch_release_2.6.0 |
| PyTorch |
2.6.0 (ROCm) |
| Flash Attention |
AOTriton (via F.scaled_dot_product_attention) |
What Works
- All module imports (DFoTVideo, DFoTVideoPose, VGGT, U-ViT3D)
- Checkpoint loading (
geometry_forcing_state_dict.ckpt, 458.8M params)
- Multi-GPU DDP inference (8 GPUs)
- End-to-end video generation with
realestate10k_mini (16 frames, 50 denoising steps)
Fixes
A few PyTorch 2.6 / torchmetrics v1.x compatibility fixes were needed (not ROCm-specific). Submitted as #17.
Results
| Metric |
Value |
| Inference time |
~69s per batch (16 frames, 50 steps) |
| Per-step speed |
~1.0–1.4 it/s |
| End-to-end (incl. model loading) |
~3.5 min |

GeometryForcing runs successfully on AMD MI300X GPUs with ROCm 6.4 — no CUDA-specific kernel changes needed.
Environment
rocm/pytorch:rocm6.4.3_ubuntu22.04_py3.10_pytorch_release_2.6.0F.scaled_dot_product_attention)What Works
geometry_forcing_state_dict.ckpt, 458.8M params)realestate10k_mini(16 frames, 50 denoising steps)Fixes
A few PyTorch 2.6 / torchmetrics v1.x compatibility fixes were needed (not ROCm-specific). Submitted as #17.
Results