Skip to content

Compilation issues with Flux.softmax() and Julia v1.12.0-beta2 #756

@nstiurca

Description

@nstiurca

I tried running Flux.jl with AMDGPU backend under Julia v1.12.0-beta2, but pretty quickly I ran into issues with LLVM errors when compiling kernels. I already filed JuliaLang/julia#58310 but they suggested I come here instead. Copying the rest of that issue here for simplicity.

Here's a minimal example to reproduce, which notably works on Julia v1.11:

nicu@blackstash:~$ julia +1.12 --startup-file=no
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.12.0-beta2 (2025-04-25)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org release
|__/                   |

julia> using AMDGPU; using Flux: softmax

julia> data = ROCArray([0.5])
1-element ROCArray{Float64, 1, AMDGPU.Runtime.Mem.HIPBuffer}:
 0.5

julia> softmax(data)
ERROR: LLVM error: Cannot select: 0x232d3d20: f64 = fmaximum nnan ninf nsz arcp contract afn reassoc # D:1 0x23398dd0, 0x226fed50, fastmath.jl:171 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:69 ]
  0x23398dd0: f64,ch = CopyFromReg 0x2325e260, Register:f64 %23, fastmath.jl:171 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:69 ]
    0x226fece0: f64 = Register %23
  0x226fed50: f64 = bitcast # D:1 0x241c46c0, /home/nicu/.julia/packages/LLVM/xTJfF/src/interop/base.jl:39 @[ none:0 @[ none:0 @[ /home/nicu/.julia/packages/LLVM/xTJfF/src/interop/pointer.jl:85 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/device/gcn/array.jl:81 @[ abstractarray.jl:1366 @[ abstractarray.jl:1342 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:37 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:69 ] ] ] ] ] ] ] ]
    0x241c46c0: v2i32,ch = load<(load (s64) from %ir.gep.peel, !tbaa !199, addrspace 1)> # D:1 0x2325e260, 0x232d4180, undef:i64, /home/nicu/.julia/packages/LLVM/xTJfF/src/interop/base.jl:39 @[ none:0 @[ none:0 @[ /home/nicu/.julia/packages/LLVM/xTJfF/src/interop/pointer.jl:85 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/device/gcn/array.jl:81 @[ abstractarray.jl:1366 @[ abstractarray.jl:1342 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:37 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:69 ] ] ] ] ] ] ] ]
      0x232d4180: i64 = add # D:1 0x23398f90, 0x22a655e0, /home/nicu/.julia/packages/LLVM/xTJfF/src/interop/base.jl:39 @[ none:0 @[ none:0 @[ /home/nicu/.julia/packages/LLVM/xTJfF/src/interop/pointer.jl:85 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/device/gcn/array.jl:81 @[ abstractarray.jl:1366 @[ abstractarray.jl:1342 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:37 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:69 ] ] ] ] ] ] ] ]
        0x23398f90: i64 = add 0x241c4a40, Constant:i64<-8>, int.jl:87 @[ int.jl:863 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:65 ] ]
          0x241c4a40: i64 = bitcast 0x22a65340
            0x22a65340: v2i32,ch = load<(dereferenceable invariant load (s64) from %ir."As[1]::ROCDeviceArray.kernarg.offset" + 8, basealign 16, addrspace 4)> 0x2325e260, 0x22a65260, undef:i64
              0x22a65260: i64 = add 0x23398eb0, Constant:i64<152>
                0x23398eb0: i64,ch = CopyFromReg 0x2325e260, Register:i64 %0
                  0x22a65810: i64 = Register %0
                0x23e944c0: i64 = Constant<152>
              0x241c42d0: i64 = undef
          0x23399000: i64 = Constant<-8>
        0x22a655e0: i64 = shl # D:1 0x226fd870, Constant:i32<3>, /home/nicu/.julia/packages/LLVM/xTJfF/src/interop/base.jl:39 @[ none:0 @[ none:0 @[ /home/nicu/.julia/packages/LLVM/xTJfF/src/interop/pointer.jl:85 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/device/gcn/array.jl:81 @[ abstractarray.jl:1366 @[ abstractarray.jl:1342 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:37 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:69 ] ] ] ] ] ] ] ]
          0x226fd870: i64 = bitcast # D:1 0x226fdcd0, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
            0x226fdcd0: v2i32 = BUILD_VECTOR # D:1 0x226fdf00, 0x226fdd40, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
              0x226fdf00: i32 = select # D:1 0x232d3fc0, 0x226fe5e0, 0x226fdfc0, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
                0x232d3fc0: i1 = setcc # D:1 0x23c5c8c0, 0x23398c10, setgt:ch, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
                  0x23c5c8c0: i64,ch = CopyFromReg # D:1 0x2325e260, Register:i64 %28, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]

                  0x23398c10: i64,ch = CopyFromReg 0x2325e260, Register:i64 %15, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]

                0x226fe5e0: i32 = extract_vector_elt # D:1 0x23e95090, Constant:i32<0>, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
                  0x23e95090: v2i32 = bitcast # D:1 0x23c5c8c0, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]

                  0x232d4c70: i32 = Constant<0>
                0x226fdfc0: i32 = extract_vector_elt 0x241c4490, Constant:i32<0>, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
                  0x241c4490: v2i32 = bitcast 0x23398c10, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]

                  0x232d4c70: i32 = Constant<0>
              0x226fdd40: i32 = select # D:1 0x232d3fc0, 0x226fe880, 0x226fedc0, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
                0x232d3fc0: i1 = setcc # D:1 0x23c5c8c0, 0x23398c10, setgt:ch, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
                  0x23c5c8c0: i64,ch = CopyFromReg # D:1 0x2325e260, Register:i64 %28, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]

                  0x23398c10: i64,ch = CopyFromReg 0x2325e260, Register:i64 %15, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]

                0x226fe880: i32 = extract_vector_elt # D:1 0x23e95090, Constant:i32<1>, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
                  0x23e95090: v2i32 = bitcast # D:1 0x23c5c8c0, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]

                  0x22a653b0: i32 = Constant<1>
                0x226fedc0: i32 = extract_vector_elt 0x241c4490, Constant:i32<1>, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]
                  0x241c4490: v2i32 = bitcast 0x23398c10, essentials.jl:799 @[ promotion.jl:643 @[ tuple.jl:385 @[ multidimensional.jl:129 @[ /home/nicu/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:68 ] ] ] ]

                  0x22a653b0: i32 = Constant<1>
          0x226fd090: i32 = Constant<3>
      0x241c42d0: i64 = undef
In function: _Z24partial_mapreduce_device8identity8max_fast7Float6416CartesianIndicesILi1E5TupleI5OneToI5Int64EEES8_14ROCDeviceArrayIS1_Li2ELi1EES9_IS1_Li1ELi1EE
Stacktrace:
  [1] handle_error(reason::Cstring)
    @ LLVM ~/.julia/packages/LLVM/xTJfF/src/core/context.jl:194
  [2] LLVMTargetMachineEmitToMemoryBuffer
    @ ~/.julia/packages/LLVM/xTJfF/lib/18/libLLVM.jl:11531 [inlined]
  [3] emit(tm::LLVM.TargetMachine, mod::LLVM.Module, filetype::LLVM.API.LLVMCodeGenFileType)
    @ LLVM ~/.julia/packages/LLVM/xTJfF/src/targetmachine.jl:118
  [4] mcgen(job::GPUCompiler.CompilerJob, mod::LLVM.Module, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1cGqD/src/mcgen.jl:75
  [5] macro expansion
    @ ~/.julia/packages/Tracy/GcShf/src/tracepoint.jl:158 [inlined]
  [6] macro expansion
    @ ~/.julia/packages/GPUCompiler/1cGqD/src/driver.jl:404 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/Tracy/GcShf/src/tracepoint.jl:158 [inlined]
  [8] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1cGqD/src/driver.jl:401
  [9] compile_unhooked(output::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1cGqD/src/driver.jl:115
 [10] compile_unhooked
    @ ~/.julia/packages/GPUCompiler/1cGqD/src/driver.jl:80 [inlined]
 [11] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1cGqD/src/driver.jl:67
 [12] compile
    @ ~/.julia/packages/GPUCompiler/1cGqD/src/driver.jl:55 [inlined]
 [13] #hipcompile##0
    @ ~/.julia/packages/AMDGPU/6s4nD/src/compiler/codegen.jl:194 [inlined]
 [14] JuliaContext(f::AMDGPU.Compiler.var"#hipcompile##0#hipcompile##1"{GPUCompiler.CompilerJob{…}}; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1cGqD/src/driver.jl:34
 [15] JuliaContext(f::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1cGqD/src/driver.jl:25
 [16] hipcompile(job::GPUCompiler.CompilerJob)
    @ AMDGPU.Compiler ~/.julia/packages/AMDGPU/6s4nD/src/compiler/codegen.jl:193
 [17] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(AMDGPU.Compiler.hipcompile), linker::typeof(AMDGPU.Compiler.hiplink))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1cGqD/src/execution.jl:245
 [18] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1cGqD/src/execution.jl:159
 [19] macro expansion
    @ ~/.julia/packages/AMDGPU/6s4nD/src/compiler/codegen.jl:161 [inlined]
 [20] macro expansion
    @ ./lock.jl:376 [inlined]
 [21] hipfunction(f::typeof(AMDGPU.partial_mapreduce_device), tt::Type{Tuple{…}}; kwargs::@Kwargs{})
    @ AMDGPU.Compiler ~/.julia/packages/AMDGPU/6s4nD/src/compiler/codegen.jl:155
 [22] hipfunction
    @ ~/.julia/packages/AMDGPU/6s4nD/src/compiler/codegen.jl:154 [inlined]
 [23] macro expansion
    @ ~/.julia/packages/AMDGPU/6s4nD/src/highlevel.jl:153 [inlined]
 [24] mapreducedim!(f::typeof(identity), op::typeof(Base.FastMath.max_fast), R::ROCArray{…}, A::ROCArray{…}; init::Float64)
    @ AMDGPU ~/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:134
 [25] mapreducedim!
    @ ~/.julia/packages/AMDGPU/6s4nD/src/kernels/mapreduce.jl:86 [inlined]
 [26] _mapreduce(f::typeof(identity), op::typeof(Base.FastMath.max_fast), As::ROCArray{…}; dims::Int64, init::Float64)
    @ GPUArrays ~/.julia/packages/GPUArrays/uiVyU/src/host/mapreduce.jl:76
 [27] _mapreduce
    @ ~/.julia/packages/GPUArrays/uiVyU/src/host/mapreduce.jl:33 [inlined]
 [28] mapreduce
    @ ~/.julia/packages/GPUArrays/uiVyU/src/host/mapreduce.jl:28 [inlined]
 [29] reduce
    @ ./reducedim.jl:375 [inlined]
 [30] fast_maximum
    @ ~/.julia/packages/NNlib/CGMj3/src/softmax.jl:92 [inlined]
 [31] softmax!(out::ROCArray{Float64, 1, AMDGPU.Runtime.Mem.HIPBuffer}, x::ROCArray{Float64, 1, AMDGPU.Runtime.Mem.HIPBuffer}; dims::Int64)
    @ NNlib ~/.julia/packages/NNlib/CGMj3/src/softmax.jl:61
 [32] softmax!
    @ ~/.julia/packages/NNlib/CGMj3/src/softmax.jl:60 [inlined]
 [33] softmax(x::ROCArray{Float64, 1, AMDGPU.Runtime.Mem.HIPBuffer}; dims::Int64)
    @ NNlib ~/.julia/packages/NNlib/CGMj3/src/softmax.jl:56
 [34] softmax(x::ROCArray{Float64, 1, AMDGPU.Runtime.Mem.HIPBuffer})
    @ NNlib ~/.julia/packages/NNlib/CGMj3/src/softmax.jl:56
 [35] top-level scope
    @ REPL[3]:1
Some type information was truncated. Use `show(err)` to see complete types.

Expected behavior:

nicu@blackstash:~$ julia +1.11 --startup-file=no
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.5 (2025-04-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using AMDGPU; using Flux: softmax

julia> data = ROCArray([0.5])
1-element ROCArray{Float64, 1, AMDGPU.Runtime.Mem.HIPBuffer}:
 0.5

julia> softmax(data)
1-element ROCArray{Float64, 1, AMDGPU.Runtime.Mem.HIPBuffer}:
 1.0

For reference,

julia> versioninfo()
Julia Version 1.12.0-beta2
Commit dd74040f22b (2025-04-25 12:03 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, znver4)
  GC: Built with stock GC
Threads: 16 default, 1 interactive, 16 GC (on 16 virtual cores)
Environment:
  JULIA_PROJECT = @.
  JULIA_NUM_THREADS = auto
  JULIA_PKG_DEVDIR = /home/nicu/src

julia> AMDGPU.device()
┌────┬─────────────────────┬──────────┬───────────┬────────────┬────────────────
│ Id │                Name │ GCN arch │ Wavefront │     Memory │ Shared Memory ⋯
├────┼─────────────────────┼──────────┼───────────┼────────────┼────────────────
│  1 │ AMD Radeon Graphics │  gfx1100 │        32 │ 27.364 GiB │    64.000 KiB ⋯
└────┴─────────────────────┴──────────┴───────────┴────────────┴────────────────

It's also worth noting that my iGPU is not officially supported in ROCm v6.4, so I used a well-known-hack to even get it working on Julia 1.11. The 780m in my APU is actually a gfx1103, but I have to convince ROCm to treat it as a gfx1100 with

export HSA_OVERRIDE_GFX_VERSION=11.0.0

So there it is. I understand I'm using an unsupported hardware/software combination, but this did work in Julia v1.11 so I'm hoping this can continue to work in v1.12 as well. FWIW, the entire Flux.jl quickstart works well for me if I just replace the using CUDA; device = gpu_device() bit with using AMDGPU; device = roc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions