Metal kernels silently read 0.0 from LLVM constant globals that contain aggregate data (tuples, SMatrix, arrays) when accessed with dynamic indices. Scalar constants and statically-indexed aggregates work correctly.
Reproducer
using Metal
function kernel_broken(out)
i = Metal.thread_position_in_grid_1d()
t = (1.0f0, 2.0f0, 3.0f0, 4.0f0)
@inbounds out[i] = t[i]
return nothing
end
out = MtlArray(ones(Float32, 4))
@metal threads=4 kernel_broken(out)
Array(out) # [0.0, 0.0, 0.0, 0.0] <-- should be [1.0, 2.0, 3.0, 4.0]
Replacing the dynamic index t[i] with a static index t[2] produces correct results. Passing the tuple as a kernel argument also works.
Environment
- Metal.jl: v1.9.3
- GPUCompiler: v1.9.1
- Julia 1.12.5 (LLVM 18.1.7)
- macOS (Apple M2)
I have a detailed IR-level analysis of the root cause and a working local fix in GPUCompiler. Happy to share if useful. However, if a fix is more appropriate elsewhere, please let me know.
Metal kernels silently read
0.0from LLVM constant globals that contain aggregate data (tuples,SMatrix, arrays) when accessed with dynamic indices. Scalar constants and statically-indexed aggregates work correctly.Reproducer
Replacing the dynamic index
t[i]with a static indext[2]produces correct results. Passing the tuple as a kernel argument also works.Environment
I have a detailed IR-level analysis of the root cause and a working local fix in GPUCompiler. Happy to share if useful. However, if a fix is more appropriate elsewhere, please let me know.