Skip to content

Metal kernel reads zeros from constant globals with dynamic indexing #781

@haakon-e

Description

@haakon-e

Metal kernels silently read 0.0 from LLVM constant globals that contain aggregate data (tuples, SMatrix, arrays) when accessed with dynamic indices. Scalar constants and statically-indexed aggregates work correctly.

Reproducer

using Metal

function kernel_broken(out)
    i = Metal.thread_position_in_grid_1d()
    t = (1.0f0, 2.0f0, 3.0f0, 4.0f0)
    @inbounds out[i] = t[i]
    return nothing
end

out = MtlArray(ones(Float32, 4))
@metal threads=4 kernel_broken(out)
Array(out)  # [0.0, 0.0, 0.0, 0.0]  <-- should be [1.0, 2.0, 3.0, 4.0]

Replacing the dynamic index t[i] with a static index t[2] produces correct results. Passing the tuple as a kernel argument also works.

Environment

  • Metal.jl: v1.9.3
  • GPUCompiler: v1.9.1
  • Julia 1.12.5 (LLVM 18.1.7)
  • macOS (Apple M2)

I have a detailed IR-level analysis of the root cause and a working local fix in GPUCompiler. Happy to share if useful. However, if a fix is more appropriate elsewhere, please let me know.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions