Skip to content

Julia 1.12: Mismatched codeinfo hits assertion error. #754

@wsmoses

Description

@wsmoses

Specifically we hit the following error (which occurs only on 1.12, not 1.10 or 1.11):

ERROR: LoadError: AssertionError: Static compilation failed
Stacktrace:
  [1] compile_method_instance(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/j4HFa/src/jlgen.jl:848
  [2] irgen(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/j4HFa/src/irgen.jl:4
  [3] emit_llvm(job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/j4HFa/src/driver.jl:200
  [4] emit_llvm
    @ ~/.julia/packages/GPUCompiler/j4HFa/src/driver.jl:182 [inlined]

After deeper investigation what is happening is as follows. Our interpreter can dispatch certain methods to the native interpreter. As a consequence the methodinstance is not sufficient to correctly identify which method something is.

Within the requisite runtimecall used by gpucompiler here to find the llvm info:

79	extern "C" JL_DLLEXPORT_CODEGEN
80	void jl_get_function_id_impl(void *native_code, jl_code_instance_t *codeinst,
81	        int32_t *func_idx, int32_t *specfunc_idx)
82	{
83	    jl_native_code_desc_t *data = (jl_native_code_desc_t*)native_code;
84	    if (data) {
(rr) 
85	        // get the function index in the fvar lookup table
86	        auto it = data->jl_fvar_map.find(codeinst);
87	        if (it != data->jl_fvar_map.end()) {
88	            std::tie(*func_idx, *specfunc_idx) = it->second;
89	        }
90	    }
91	}
92	

the codeinstance provided is not found.

The corresopnding codeinstance we had was:

(rr) p codeinst
$15 = (jl_code_instance_t *) 0x7672abb5ce20
(rr) p jl_(codeinst->def)
(::Type{ArgumentError})(String) from (::Type{ArgumentError})(AbstractString)

However the contents of jl_fvar_map were:

(rr) p jl_((data->jl_fvar_map.begin())->first.def)
similar(Reactant.TracedRArray{Float64, 2}, Type{Reactant.TracedRNumber{Float64}}, Tuple{Int64, Int64}) from similar(Reactant.TracedRArray{T, N} where N where T, Type{T}, Tuple{Vararg{Int64, N}}) where {T, N}
(rr) p jl_((++data->jl_fvar_map.begin())->first.def)
kwcall(NamedTuple{(:location,), Tuple{Reactant.MLIR.IR.Location}}, typeof(Reactant.Ops.fill), Float64, Array{Int64, 1}) from kwcall(NamedTuple{names, T} where T<:Tuple where names, typeof(Reactant.Ops.fill), Float64, Array{Int64, 1})
$11 = void
(rr) p jl_((++++data->jl_fvar_map.begin())->first.def)
throw_boundserror(Array{Int64, 1}, Tuple{Int64}) from throw_boundserror(Any, Any)
$12 = void
(rr) p jl_((++++++data->jl_fvar_map.begin())->first.def)
(::Type{ArgumentError})(String) from (::Type{ArgumentError})(AbstractString)
$13 = void
(rr) p (++++++data->jl_fvar_map.begin())->first
$14 = (_jl_code_instance_t * const) 0x767402428b90 <jl_system_image_data+59594448>

So it contains the right methodinstance, but wrong codeinstance.

The code instances are found as follows:

    if VERSION >= v"1.13.0-DEV.1120"
        # on sufficiently recent versions of Julia, we can query the CIs compiled.
        # this is required after the move to `invoke(::CodeInstance)`, because our
        # lookup function (used to populate method_instances) isn't always called then.

        num_cis = Ref{Csize_t}(0)
        @ccall jl_get_llvm_cis(native_code::Ptr{Cvoid}, num_cis::Ptr{Csize_t},
                               C_NULL::Ptr{Cvoid})::Nothing
        resize!(method_instances, num_cis[])
        @ccall jl_get_llvm_cis(native_code::Ptr{Cvoid}, num_cis::Ptr{Csize_t},
                               method_instances::Ptr{Cvoid})::Nothing

        for (i, ci) in enumerate(method_instances)
            method_instances[i] = ci.def::MethodInstance 
        end                     
    
    elseif VERSION >= v"1.12.0-DEV.1703"
        # slightly older versions of Julia used MIs directly
    
        num_mis = Ref{Csize_t}(0)
        @ccall jl_get_llvm_mis(native_code::Ptr{Cvoid}, num_mis::Ptr{Csize_t},
                               C_NULL::Ptr{Cvoid})::Nothing
        resize!(method_instances, num_mis[])
        @ccall jl_get_llvm_mis(native_code::Ptr{Cvoid}, num_mis::Ptr{Csize_t},
                               method_instances::Ptr{Cvoid})::Nothing
    end

    # process all compiled method instances
    compiled = Dict()
    for mi in method_instances
        ci = ci_cache_lookup(cache, mi, job.world, job.world)
        ci === nothing && continue
        @show ci, ci.owner, mi, job.world
        # get the function index
        llvm_func_idx = Ref{Int32}(-1)
        llvm_specfunc_idx = Ref{Int32}(-1)
        ccall(:jl_get_function_id, Nothing,
              (Ptr{Cvoid}, Any, Ptr{Int32}, Ptr{Int32}),
              native_code, ci, llvm_func_idx, llvm_specfunc_idx)
        @assert llvm_func_idx[] != -1 || llvm_specfunc_idx[] != -1 "Static compilation failed"

Specifically on 1.12 we call jl_get_llvm_mis which internally performs a mapping, losing the notion of which codeinfo we care about, and ci_cache_lookup returns the wrong one. The jl_get_llvm_cis avoids this issue by directly preserving the right ci.

I've started a 1.12 backport PR: JuliaLang/julia#60725 to leverage the right function on 1.12.

However in the meantime for 1.12 I'm going to loosen the jl_get_function_id to continue if not found on 1.12 [just like if the ci === nothing].

cc @gbaraldi @vchuravy @glou-nes

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingupstream

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions