Add AMD GPU support via the HIP backend by jeffdaily · Pull Request #400 · mumax/3

jeffdaily · 2026-06-11T00:16:10Z

This adds an additive, opt-in HIP backend so mumax3 runs on AMD GPUs, alongside the existing CUDA path. The default CUDA build is behaviorally unchanged.

How it works

Each generated CUDA *_wrapper.go gains a //go:build !hip constraint (true in the default build); under -tags hip the matching generated *_wrapper_hip.go compiles instead. The HIP wrappers are cuda2go-generated and committed, exactly like the existing CUDA wrappers, so users build without the device toolchain.

The CUDA path embeds PTX, a forward-compatible virtual ISA the driver finalizes at load. The faithful AMD analog is amdgcnspirv: hipcc --genco --offload-arch=amdgcnspirv emits one generic SPIR-V image that the ROCm runtime finalizes for the present GPU at hipModuleLoadData time. The backend embeds a single generic image per kernel, so there is no per-arch code-object matrix and no arch-detection loader: one build runs on any supported GPU with no rebuild for new hardware (no CUDA_CC for the HIP build).

Device math is reached through the HIP driver API (the analog of the CUDA driver API already used here); cuRAND/cuFFT are served by hipRAND/hipFFT. Two AMD wave-semantics fixes were needed: a reduction that drops the unrolled 32-lane tail in favor of an all-__syncthreads tree (correct on both wave64 CDNA and wave32 RDNA), and an atomicCAS-based fmaxabs (HIP drops int atomicMax on coarse-grained memory). The device kernels are otherwise unchanged.

Building

go install -tags hip github.com/mumax/3/...

Needs a ROCm install (HIP runtime + headers; cgo defaults to /opt/rocm). The committed device images mean no hipcc is required for a plain build. The README documents this alongside the CUDA build.

Validation

Tested on real GPUs across three architectures:

GPU	arch	wave	OS	ROCm
Instinct MI250X	gfx90a	64	Linux	7.2.1
Radeon Pro W7800	gfx1100	32	Linux	7.2.1
Radeon RX 9070 XT	gfx1201	32	Windows	7.14

go test -tags hip -count=1 ./cuda/ ./cuda/cu/      # 8/8, 12/12 PASS
go test -tags hip -count=1 ./data/... ./httpfs/...  # regression PASS
mumax3 ... test/standardproblem4.mx3   # M.Average within 1e-5 (gate 1e-3)
mumax3 ... test/standardproblem5.mx3   # mx/my/mz within 1e-4

standardproblem4 agrees to ~5e-7 between gfx90a and gfx1100; cross-arch results match to 5 significant figures.

This work was authored with assistance from an AI coding assistant (Claude).

Adds an additive, opt-in HIP backend so mumax3 runs on AMD GPUs, alongside the existing CUDA path. Build it with `go install -tags hip github.com/mumax/3/...` (or `make BACKEND=hip`). The default CUDA build is behaviorally unchanged: each generated CUDA *_wrapper.go gains only a `//go:build !hip` constraint (true in the default build), so it is excluded only under `-tags hip`, where the matching generated *_wrapper_hip.go compiles instead. As with the existing CUDA wrappers, the HIP wrappers are cuda2go-generated and committed, so users build without the device toolchain. Embedding model: the CUDA path embeds PTX, a forward-compatible virtual ISA the driver finalizes (JITs) at load. The faithful AMD analog is amdgcnspirv: hipcc --genco --offload-arch=amdgcnspirv emits one generic SPIR-V image that the ROCm runtime finalizes for the present GPU at hipModuleLoadData time. The HIP backend embeds a single generic image per kernel and needs no per-arch code-object matrix and no arch-detection loader: one image runs on any supported gfx arch with no rebuild for new GPUs, so CUDA_CC does not drive the HIP build. Device math is reached through the HIP driver API (the analog of the CUDA driver API already used here); cuRAND/cuFFT are served by hipRAND/hipFFT. Two AMD wave-semantics fixes were needed: a reduction that drops the unrolled 32-lane tail in favor of an all-__syncthreads tree (correct on both wave64 CDNA and wave32 RDNA), and an atomicCAS-based fmaxabs (HIP drops int atomicMax on coarse-grained memory). The device .cu/.cuh kernels are otherwise unchanged: amdgcnspirv defines __HIP_PLATFORM_AMD__ so the existing AMD guards resolve, and no kernel assumes a compile-time wave width, so none needed a dynamic-warpSize fix. To review: cuda/Makefile (BACKEND=hip compiles one amdgcnspirv image per kernel) and cuda/cuda2go.go (the hip template embeds one base64 blob and a <name>_image) drive the codegen; the regenerated cuda/*_wrapper_hip.go follow mechanically. cuda/fatbin_hip.go hands the blob straight to ModuleLoadData. The cuda/cu and cuda/cufft *_hip.go files are the HIP driver-API plumbing; engine/*_hip.go is the device-name reporting. README.md documents the AMD build alongside the CUDA one. The bulk of the diff is the generated *_wrapper_hip.go files, mechanical like their committed CUDA counterparts. Hardware validated: Instinct MI250X gfx90a (CDNA2, wave64) Linux ROCm 7.2.1 Radeon Pro W7800 gfx1100 (RDNA3, wave32) Linux ROCm 7.2.1 Radeon RX 9070 XT gfx1201 (RDNA4, wave32) Windows ROCm 7.14 (TheRock) Test Plan: cd cuda && make wrappers BACKEND=hip && cd .. go install -tags hip github.com/mumax/3/... go test -tags hip -count=1 ./cuda/ ./cuda/cu/ go test -tags hip -count=1 ./data/... ./httpfs/... mumax3 -paranoid=false -cache /tmp -http "" test/standardproblem4.mx3 mumax3 -paranoid=false -cache /tmp -http "" test/standardproblem5.mx3 Results on all three GPUs: ./cuda/ 8/8 PASS, ./cuda/cu/ 12/12 PASS (TestModule on the generic image, cufft FFT1D PASS), non-GPU regression PASS. standardproblem4 M.Average() within 1e-5 (gate 1e-3), gfx90a vs gfx1100 agree to ~5e-7; standardproblem5 mx/my/mz within 1e-4, cross-arch match to 5 significant figures. Boot log reports "using generic amdgcnspirv image" -- the single generic image loads, no gfx-specific code-object selection remains. This work was authored with assistance from an AI coding assistant (Claude).

jeffdaily added a commit to jeffdaily/moat that referenced this pull request Jun 11, 2026

[3] upstream PR opened: mumax/3#400 (linux-gfx90a -> pr-open)

51730b2

JLeliaert requested a review from JonathanMaes June 11, 2026 06:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AMD GPU support via the HIP backend#400

Add AMD GPU support via the HIP backend#400
jeffdaily wants to merge 1 commit into
mumax:masterfrom
jeffdaily:moat-port

jeffdaily commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jeffdaily commented Jun 11, 2026

How it works

Building

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant