I tried running Flux.jl with AMDGPU backend under Julia v1.12.0-beta2, but pretty quickly I ran into issues with LLVM errors when compiling kernels. I already filed JuliaLang/julia#58310 but they suggested I come here instead. Copying the rest of that issue here for simplicity.
nicu@blackstash:~$ julia +1.12 --startup-file=no
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.12.0-beta2 (2025-04-25)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org release
|__/ |
julia> using AMDGPU; using Flux: softmax
julia> data = ROCArray([0.5])
1-element ROCArray{Float64, 1, AMDGPU.Runtime.Mem.HIPBuffer}:
0.5
julia> softmax(data)
ERROR: LLVM error: Cannot select: 0x232d3d20: f64 = fmaximum nnan ninf nsz arcp contract afn reassoc # D:1 0x23398dd0, 0x226fed50, fastmath.jl:171 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:69 ]
0x23398dd0: f64,ch = CopyFromReg 0x2325e260, Register:f64 %23, fastmath.jl:171 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:69 ]
0x226fece0: f64 = Register %23
0x226fed50: f64 = bitcast # D:1 0x241c46c0, /home/nicu/.julia/packages/LLVM/xTJfF/src/interop/base.jl:39 @[ none:0 @[ none:0 @[ /home/nicu/.julia/packages/LLVM/xTJfF/src/interop/pointer.jl:85 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/device/gcn/array.jl:81 @[ abstractarray.jl:1366 @[ abstractarray.jl:1342 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:37 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:69 ] ] ] ] ] ] ] ]
0x241c46c0: v2i32,ch = load<(load (s64) from %ir.gep.peel, !tbaa !199, addrspace 1)> # D:1 0x2325e260, 0x232d4180, undef:i64, /home/nicu/.julia/packages/LLVM/xTJfF/src/interop/base.jl:39 @[ none:0 @[ none:0 @[ /home/nicu/.julia/packages/LLVM/xTJfF/src/interop/pointer.jl:85 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/device/gcn/array.jl:81 @[ abstractarray.jl:1366 @[ abstractarray.jl:1342 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:37 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:69 ] ] ] ] ] ] ] ]
0x232d4180: i64 = add # D:1 0x23398f90, 0x22a655e0, /home/nicu/.julia/packages/LLVM/xTJfF/src/interop/base.jl:39 @[ none:0 @[ none:0 @[ /home/nicu/.julia/packages/LLVM/xTJfF/src/interop/pointer.jl:85 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/device/gcn/array.jl:81 @[ abstractarray.jl:1366 @[ abstractarray.jl:1342 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:37 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:69 ] ] ] ] ] ] ] ]
0x23398f90: i64 = add 0x241c4a40, Constant:i64<-8>, int.jl:87 @[ int.jl:863 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:65 ] ]
0x241c4a40: i64 = bitcast 0x22a65340
0x22a65340: v2i32,ch = load<(dereferenceable invariant load (s64) from %ir."As[1]::ROCDeviceArray.kernarg.offset" + 8, basealign 16, addrspace 4)> 0x2325e260, 0x22a65260, undef:i64
0x22a65260: i64 = add 0x23398eb0, Constant:i64<152>
0x23398eb0: i64,ch = CopyFromReg 0x2325e260, Register:i64 %0
0x22a65810: i64 = Register %0
0x23e944c0: i64 = Constant<152>
0x241c42d0: i64 = undef
0x23399000: i64 = Constant<-8>
0x22a655e0: i64 = shl # D:1 0x226fd870, Constant:i32<3>, /home/nicu/.julia/packages/LLVM/xTJfF/src/interop/base.jl:39 @[ none:0 @[ none:0 @[ /home/nicu/.julia/packages/LLVM/xTJfF/src/interop/pointer.jl:85 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/device/gcn/array.jl:81 @[ abstractarray.jl:1366 @[ abstractarray.jl:1342 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:37 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:69 ] ] ] ] ] ] ] ]
0x226fd870: i64 = bitcast # D:1 0x226fdcd0, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x226fdcd0: v2i32 = BUILD_VECTOR # D:1 0x226fdf00, 0x226fdd40, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x226fdf00: i32 = select # D:1 0x232d3fc0, 0x226fe5e0, 0x226fdfc0, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x232d3fc0: i1 = setcc # D:1 0x23c5c8c0, 0x23398c10, setgt:ch, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x23c5c8c0: i64,ch = CopyFromReg # D:1 0x2325e260, Register:i64 %28, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x23398c10: i64,ch = CopyFromReg 0x2325e260, Register:i64 %15, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x226fe5e0: i32 = extract_vector_elt # D:1 0x23e95090, Constant:i32<0>, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x23e95090: v2i32 = bitcast # D:1 0x23c5c8c0, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x232d4c70: i32 = Constant<0>
0x226fdfc0: i32 = extract_vector_elt 0x241c4490, Constant:i32<0>, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x241c4490: v2i32 = bitcast 0x23398c10, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x232d4c70: i32 = Constant<0>
0x226fdd40: i32 = select # D:1 0x232d3fc0, 0x226fe880, 0x226fedc0, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x232d3fc0: i1 = setcc # D:1 0x23c5c8c0, 0x23398c10, setgt:ch, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x23c5c8c0: i64,ch = CopyFromReg # D:1 0x2325e260, Register:i64 %28, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x23398c10: i64,ch = CopyFromReg 0x2325e260, Register:i64 %15, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x226fe880: i32 = extract_vector_elt # D:1 0x23e95090, Constant:i32<1>, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x23e95090: v2i32 = bitcast # D:1 0x23c5c8c0, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x22a653b0: i32 = Constant<1>
0x226fedc0: i32 = extract_vector_elt 0x241c4490, Constant:i32<1>, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x241c4490: v2i32 = bitcast 0x23398c10, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
0x22a653b0: i32 = Constant<1>
0x226fd090: i32 = Constant<3>
0x241c42d0: i64 = undef
In function: _Z24partial_mapreduce_device8identity8max_fast7Float6416CartesianIndicesILi1E5TupleI5OneToI5Int64EEES8_14ROCDeviceArrayIS1_Li2ELi1EES9_IS1_Li1ELi1EE
Stacktrace:
[1] handle_error(reason::Cstring)
@ LLVM ~/.julia/packages/LLVM/xTJfF/src/core/context.jl:194
[2] LLVMTargetMachineEmitToMemoryBuffer
@ ~/.julia/packages/LLVM/xTJfF/lib/18/libLLVM.jl:11531 [inlined]
[3] emit(tm::LLVM.TargetMachine, mod::LLVM.Module, filetype::LLVM.API.LLVMCodeGenFileType)
@ LLVM ~/.julia/packages/LLVM/xTJfF/src/targetmachine.jl:118
[4] mcgen(job::GPUCompiler.CompilerJob, mod::LLVM.Module, format::LLVM.API.LLVMCodeGenFileType)
@ GPUCompiler ~/.julia/packages/GPUCompiler/1cGqD/src/mcgen.jl:75
[5] macro expansion
@ ~/.julia/packages/Tracy/GcShf/src/tracepoint.jl:158 [inlined]
[6] macro expansion
@ ~/.julia/packages/GPUCompiler/1cGqD/src/driver.jl:404 [inlined]
[7] macro expansion
@ ~/.julia/packages/Tracy/GcShf/src/tracepoint.jl:158 [inlined]
[8] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module, format::LLVM.API.LLVMCodeGenFileType)
@ GPUCompiler ~/.julia/packages/GPUCompiler/1cGqD/src/driver.jl:401
[9] compile_unhooked(output::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/1cGqD/src/driver.jl:115
[10] compile_unhooked
@ ~/.julia/packages/GPUCompiler/1cGqD/src/driver.jl:80 [inlined]
[11] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/1cGqD/src/driver.jl:67
[12] compile
@ ~/.julia/packages/GPUCompiler/1cGqD/src/driver.jl:55 [inlined]
[13] #hipcompile##0
@ ~/.julia/packages/AMDGPU/6s4nD/src/compiler/codegen.jl:194 [inlined]
[14] JuliaContext(f::AMDGPU.Compiler.var"#hipcompile##0#hipcompile##1"{GPUCompiler.CompilerJob{…}}; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/1cGqD/src/driver.jl:34
[15] JuliaContext(f::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/1cGqD/src/driver.jl:25
[16] hipcompile(job::GPUCompiler.CompilerJob)
@ AMDGPU.Compiler ~/.julia/packages/AMDGPU/6s4nD/src/compiler/codegen.jl:193
[17] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(AMDGPU.Compiler.hipcompile), linker::typeof(AMDGPU.Compiler.hiplink))
@ GPUCompiler ~/.julia/packages/GPUCompiler/1cGqD/src/execution.jl:245
[18] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/1cGqD/src/execution.jl:159
[19] macro expansion
@ ~/.julia/packages/AMDGPU/6s4nD/src/compiler/codegen.jl:161 [inlined]
[20] macro expansion
@ ./lock.jl:376 [inlined]
[21] hipfunction(f::typeof(AMDGPU.partial_mapreduce_device), tt::Type{Tuple{…}}; kwargs::@Kwargs{})
@ AMDGPU.Compiler ~/.julia/packages/AMDGPU/6s4nD/src/compiler/codegen.jl:155
[22] hipfunction
@ ~/.julia/packages/AMDGPU/6s4nD/src/compiler/codegen.jl:154 [inlined]
[23] macro expansion
@ ~/.julia/packages/AMDGPU/6s4nD/src/highlevel.jl:153 [inlined]
[24] mapreducedim!(f::typeof(identity), op::typeof(Base.FastMath.max_fast), R::ROCArray{…}, A::ROCArray{…}; init::Float64)
@ AMDGPU ~/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:134
[25] mapreducedim!
@ ~/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:86 [inlined]
[26] _mapreduce(f::typeof(identity), op::typeof(Base.FastMath.max_fast), As::ROCArray{…}; dims::Int64, init::Float64)
@ GPUArrays ~/.julia/packages/GPUArrays/uiVyU/src/host/mapreduce.jl:76
[27] _mapreduce
@ ~/.julia/packages/GPUArrays/uiVyU/src/host/mapreduce.jl:33 [inlined]
[28] mapreduce
@ ~/.julia/packages/GPUArrays/uiVyU/src/host/mapreduce.jl:28 [inlined]
[29] reduce
@ ./reducedim.jl:375 [inlined]
[30] fast_maximum
@ ~/.julia/packages/NNlib/CGMj3/src/softmax.jl:92 [inlined]
[31] softmax!(out::ROCArray{Float64, 1, AMDGPU.Runtime.Mem.HIPBuffer}, x::ROCArray{Float64, 1, AMDGPU.Runtime.Mem.HIPBuffer}; dims::Int64)
@ NNlib ~/.julia/packages/NNlib/CGMj3/src/softmax.jl:61
[32] softmax!
@ ~/.julia/packages/NNlib/CGMj3/src/softmax.jl:60 [inlined]
[33] softmax(x::ROCArray{Float64, 1, AMDGPU.Runtime.Mem.HIPBuffer}; dims::Int64)
@ NNlib ~/.julia/packages/NNlib/CGMj3/src/softmax.jl:56
[34] softmax(x::ROCArray{Float64, 1, AMDGPU.Runtime.Mem.HIPBuffer})
@ NNlib ~/.julia/packages/NNlib/CGMj3/src/softmax.jl:56
[35] top-level scope
@ REPL[3]:1
Some type information was truncated. Use `show(err)` to see complete types.
nicu@blackstash:~$ julia +1.11 --startup-file=no
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.11.5 (2025-04-14)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using AMDGPU; using Flux: softmax
julia> data = ROCArray([0.5])
1-element ROCArray{Float64, 1, AMDGPU.Runtime.Mem.HIPBuffer}:
0.5
julia> softmax(data)
1-element ROCArray{Float64, 1, AMDGPU.Runtime.Mem.HIPBuffer}:
1.0
julia> versioninfo()
Julia Version 1.12.0-beta2
Commit dd74040f22b (2025-04-25 12:03 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, znver4)
GC: Built with stock GC
Threads: 16 default, 1 interactive, 16 GC (on 16 virtual cores)
Environment:
JULIA_PROJECT = @.
JULIA_NUM_THREADS = auto
JULIA_PKG_DEVDIR = /home/nicu/src
julia> AMDGPU.device()
┌────┬─────────────────────┬──────────┬───────────┬────────────┬────────────────
│ Id │ Name │ GCN arch │ Wavefront │ Memory │ Shared Memory ⋯
├────┼─────────────────────┼──────────┼───────────┼────────────┼────────────────
│ 1 │ AMD Radeon Graphics │ gfx1100 │ 32 │ 27.364 GiB │ 64.000 KiB ⋯
└────┴─────────────────────┴──────────┴───────────┴────────────┴────────────────
It's also worth noting that my iGPU is not officially supported in ROCm v6.4, so I used a well-known-hack to even get it working on Julia 1.11. The 780m in my APU is actually a gfx1103, but I have to convince ROCm to treat it as a gfx1100 with
So there it is. I understand I'm using an unsupported hardware/software combination, but this did work in Julia v1.11 so I'm hoping this can continue to work in v1.12 as well. FWIW, the entire Flux.jl quickstart works well for me if I just replace the using CUDA; device = gpu_device() bit with using AMDGPU; device = roc.
I tried running Flux.jl with AMDGPU backend under Julia v1.12.0-beta2, but pretty quickly I ran into issues with LLVM errors when compiling kernels. I already filed JuliaLang/julia#58310 but they suggested I come here instead. Copying the rest of that issue here for simplicity.
Here's a minimal example to reproduce, which notably works on Julia v1.11:
Expected behavior:
For reference,
It's also worth noting that my iGPU is not officially supported in ROCm v6.4, so I used a well-known-hack to even get it working on Julia 1.11. The 780m in my APU is actually a gfx1103, but I have to convince ROCm to treat it as a gfx1100 with
So there it is. I understand I'm using an unsupported hardware/software combination, but this did work in Julia v1.11 so I'm hoping this can continue to work in v1.12 as well. FWIW, the entire Flux.jl quickstart works well for me if I just replace the
using CUDA; device = gpu_device()bit withusing AMDGPU; device = roc.