Skip to content

Not working on RDNA4 / gfx1201 #776

@aaruni96

Description

@aaruni96

Questionnaire

  1. Does ROCm works for you outside of Julia, e.g. C/C++/Python?

Yes

  1. Post output of rocminfo.
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
Runtime Ext Version:     1.7
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
XNACK enabled:           NO
DMAbuf Support:          YES
VMM Support:             YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 9 7900 12-Core Processor 
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 9 7900 12-Core Processor 
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   5485                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            24                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    31940948(0x1e76154) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    31940948(0x1e76154) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    31940948(0x1e76154) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 4                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    31940948(0x1e76154) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1201                            
  Uuid:                    GPU-27a7a7d52ee60cbb               
  Marketing Name:          AMD Radeon RX 9070 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      8192(0x2000) KB                    
    L3:                      65536(0x10000) KB                  
  Chip ID:                 30032(0x7550)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          256(0x100)                         
  Max Clock Freq. (MHz):   2400                               
  BDFID:                   768                                
  Internal Node ID:        1                                  
  Compute Unit:            64                                 
  SIMDs per CU:            2                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 1012                               
  SDMA engine uCode::      838                                
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16695296(0xfec000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1201         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
    ISA 2                    
      Name:                    amdgcn-amd-amdhsa--gfx12-generic   
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 3                  
*******                  
  Name:                    gfx1036                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    2                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      256(0x100) KB                      
  Chip ID:                 5710(0x164e)                       
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          128(0x80)                          
  Max Clock Freq. (MHz):   2200                               
  BDFID:                   6656                               
  Internal Node ID:        2                                  
  Compute Unit:            2                                  
  SIMDs per CU:            2                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       APU
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 22                                 
  SDMA engine uCode::      9                                  
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    15970472(0xf3b0a8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    15970472(0xf3b0a8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1036         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
    ISA 2                    
      Name:                    amdgcn-amd-amdhsa--gfx10-3-generic 
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             

  1. Post output of AMDGPU.versioninfo() if possible.
julia> AMDGPU.versioninfo()
[ Info: AMDGPU versioninfo
:0:/usr/src/debug/hip-runtime/hip-runtime-clr/hipamd/src/hip_global.cpp:158 : 5486189594 us:  Module not initialized

[85700] signal 6 (-6): Aborted
in expression starting at REPL[4]:1
unknown function (ip: 0x7f5e8d5ec74c)
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x7f5de667921e)
unknown function (ip: 0x7f5de6870db3)
unknown function (ip: 0x7f5de68345c1)
unknown function (ip: 0x7f5de683a9da)
unknown function (ip: 0x7f5dade183a2)
rocsparse_create_handle at /opt/rocm/lib/librocsparse.so (unknown line)
macro expansion at /home/aaruni/.julia/packages/AMDGPU/wH6SV/src/sparse/error.jl:80 [inlined]
rocsparse_create_handle at /home/aaruni/.julia/packages/AMDGPU/wH6SV/src/sparse/librocsparse.jl:7
create_handle at /home/aaruni/.julia/packages/AMDGPU/wH6SV/src/sparse/rocSPARSE.jl:31 [inlined]
#5 at /home/aaruni/.julia/packages/AMDGPU/wH6SV/src/cache.jl:115 [inlined]
pop! at /home/aaruni/.julia/packages/AMDGPU/wH6SV/src/cache.jl:49
new_state at /home/aaruni/.julia/packages/AMDGPU/wH6SV/src/cache.jl:114
#9 at /home/aaruni/.julia/packages/AMDGPU/wH6SV/src/cache.jl:127 [inlined]
get! at ./dict.jl:458
library_state at /home/aaruni/.julia/packages/AMDGPU/wH6SV/src/cache.jl:127
lib_state at /home/aaruni/.julia/packages/AMDGPU/wH6SV/src/sparse/rocSPARSE.jl:37 [inlined]
handle at /home/aaruni/.julia/packages/AMDGPU/wH6SV/src/sparse/rocSPARSE.jl:41 [inlined]
version at /home/aaruni/.julia/packages/AMDGPU/wH6SV/src/sparse/rocSPARSE.jl:46
_ver at /home/aaruni/.julia/packages/AMDGPU/wH6SV/src/utils.jl:5 [inlined]
versioninfo at /home/aaruni/.julia/packages/AMDGPU/wH6SV/src/utils.jl:6
unknown function (ip: 0x7f5e0df41a3f)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
do_call at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_value at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/interpreter.c:223
eval_stmt_value at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined]
eval_body at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/interpreter.c:666
jl_interpret_toplevel_thunk at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/interpreter.c:824
jl_toplevel_eval_flex at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_toplevel_eval_flex at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:430 [inlined]
eval_user_input at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:261
repl_backend_loop at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:368
#start_repl_backend#59 at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:343
start_repl_backend at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:340
#run_repl#76 at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:500
run_repl at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:486
jfptr_run_repl_10123.1 at /home/aaruni/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
#1150 at ./client.jl:446
jfptr_YY.1150_14797.1 at /home/aaruni/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
run_main_repl at ./client.jl:430
repl_main at ./client.jl:567 [inlined]
_start at ./client.jl:541
jfptr__start_73430.1 at /home/aaruni/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
true_main at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/cli/loader_exe.c:58
unknown function (ip: 0x7f5e8d57c6b4)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 40378658 (Pool: 40374356; Big: 4302); GC: 48
[1]    85700 IOT instruction (core dumped)  julia +release --project=./

Reproducing the bug

  1. Describe what's not working.

Does not work at all with RX 9070 XT / gfx1201 . Possibly needs newer rocm libraries. Support for this GPU was added in rocm 6.4

  1. Provide MWE to reproduce it (if possible).

On an RX 9070 XT,

julia> using AMDGPU

julia> AMDGPU.zeros(1)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
'gfx1201' is not a recognized processor for this target (ignoring processor)
ERROR: LLVM error: Cannot select: 0x23a4ac70: ch = store<(store (s32) into %ir.17, !tbaa !137, addrspace 1)> # D:1 0x23caf7a0, 0x2360ec50, 0x237f2b40, undef:i64, /home/aaruni/.julia/packages/LLVM/2JPxT/src/interop/base.jl:39 @[ none:0 @[ none:0 @[ /home/aaruni/.julia/packages/LLVM/2JPxT/src/interop/pointer.jl:88 @[ /home/aaruni/.julia/packages/AMDGPU/wH6SV/src/device/gcn/array.jl:86 @[ /home/aaruni/.julia/packages/KernelAbstractions/C3nYQ/src/macros.jl:322 @[ none:0 ] ] ] ] ] ]
  0x2360ec50: i32,ch = load<(dereferenceable invariant load (s32) from %ir..kernarg.offset3.cast, align 8, addrspace 4)> 0x23caf7a0, 0x237f2de0, undef:i64
    0x237f2de0: i64 = add nuw 0x2360ee80, Constant:i64<136>
      0x2360ee80: i64,ch = CopyFromReg 0x23caf7a0, Register:i64 %0
        0x23664d20: i64 = Register %0
      0x2360e710: i64 = Constant<136>
    0x237f2ec0: i64 = undef
  0x237f2b40: i64 = add # D:1 0x237f32b0, Constant:i64<-4>, /home/aaruni/.julia/packages/LLVM/2JPxT/src/interop/base.jl:39 @[ none:0 @[ none:0 @[ /home/aaruni/.julia/packages/LLVM/2JPxT/src/interop/pointer.jl:88 @[ /home/aaruni/.julia/packages/AMDGPU/wH6SV/src/device/gcn/array.jl:86 @[ /home/aaruni/.julia/packages/KernelAbstractions/C3nYQ/src/macros.jl:322 @[ none:0 ] ] ] ] ] ]
    0x237f32b0: i64 = add # D:1 0x23664c40, 0x237f2ad0, /home/aaruni/.julia/packages/LLVM/2JPxT/src/interop/base.jl:39 @[ none:0 @[ none:0 @[ /home/aaruni/.julia/packages/LLVM/2JPxT/src/interop/pointer.jl:88 @[ /home/aaruni/.julia/packages/AMDGPU/wH6SV/src/device/gcn/array.jl:86 @[ /home/aaruni/.julia/packages/KernelAbstractions/C3nYQ/src/macros.jl:322 @[ none:0 ] ] ] ] ] ]
      0x23664c40: i64 = shl # D:1 0x237f3320, Constant:i32<2>
        0x237f3320: i64,ch = CopyFromReg # D:1 0x23caf7a0, Register:i64 %1
          0x2360f270: i64 = Register %1
        0x237f3390: i32 = Constant<2>
      0x237f2ad0: i64 = bitcast 0x237f2bb0
        0x237f2bb0: v2i32,ch = load<(dereferenceable invariant load (s64) from %ir..kernarg.offset1.cast + 8, basealign 16, addrspace 4)> 0x23caf7a0, 0x237f3010, undef:i64
          0x237f3010: i64 = add 0x2360ee80, Constant:i64<120>
            0x2360ee80: i64,ch = CopyFromReg 0x23caf7a0, Register:i64 %0
              0x23664d20: i64 = Register %0
            0x2360eef0: i64 = Constant<120>
          0x237f2ec0: i64 = undef
    0x237f3080: i64 = Constant<-4>
  0x237f2ec0: i64 = undef
In function: _Z16gpu_fill_kernel_16CompilerMetadataI11DynamicSize12DynamicCheckv16CartesianIndicesILi1E5TupleI5OneToI5Int64EEE7NDRangeILi1ES0_S0_S8_S8_EE14ROCDeviceArrayI7Float32Li1ELi1EESD_
Stacktrace:
  [1] handle_error(reason::Cstring)
    @ LLVM ~/.julia/packages/LLVM/2JPxT/src/core/context.jl:194
  [2] LLVMTargetMachineEmitToMemoryBuffer(T::LLVM.TargetMachine, M::LLVM.Module, codegen::LLVM.API.LLVMCodeGenFileType, ErrorMessage::Base.RefValue{…}, OutMemBuf::Base.RefValue{…})
    @ LLVM.API ~/.julia/packages/LLVM/2JPxT/lib/16/libLLVM.jl:11138
  [3] emit(tm::LLVM.TargetMachine, mod::LLVM.Module, filetype::LLVM.API.LLVMCodeGenFileType)
    @ LLVM ~/.julia/packages/LLVM/2JPxT/src/targetmachine.jl:118
  [4] mcgen(job::GPUCompiler.CompilerJob, mod::LLVM.Module, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Emuht/src/mcgen.jl:75
  [5] macro expansion
    @ ~/.julia/packages/Tracy/GcShf/src/tracepoint.jl:158 [inlined]
  [6] macro expansion
    @ ~/.julia/packages/GPUCompiler/Emuht/src/driver.jl:404 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/Tracy/GcShf/src/tracepoint.jl:158 [inlined]
  [8] macro expansion
    @ ~/.julia/packages/GPUCompiler/Emuht/src/driver.jl:401 [inlined]
  [9] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Emuht/src/utils.jl:116
 [10] compile_unhooked(output::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Emuht/src/driver.jl:115
 [11] compile_unhooked
    @ ~/.julia/packages/GPUCompiler/Emuht/src/driver.jl:80 [inlined]
 [12] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Emuht/src/driver.jl:67
 [13] compile
    @ ~/.julia/packages/GPUCompiler/Emuht/src/driver.jl:55 [inlined]
 [14] #40
    @ ~/.julia/packages/AMDGPU/wH6SV/src/compiler/codegen.jl:194 [inlined]
 [15] JuliaContext(f::AMDGPU.Compiler.var"#40#41"{GPUCompiler.CompilerJob{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.HIPCompilerParams}}; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Emuht/src/driver.jl:34
 [16] JuliaContext(f::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Emuht/src/driver.jl:25
 [17] hipcompile(job::GPUCompiler.CompilerJob)
    @ AMDGPU.Compiler ~/.julia/packages/AMDGPU/wH6SV/src/compiler/codegen.jl:193
 [18] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(AMDGPU.Compiler.hipcompile), linker::typeof(AMDGPU.Compiler.hiplink))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Emuht/src/execution.jl:245
 [19] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Emuht/src/execution.jl:159
 [20] macro expansion
    @ ~/.julia/packages/AMDGPU/wH6SV/src/compiler/codegen.jl:161 [inlined]
 [21] macro expansion
    @ ./lock.jl:273 [inlined]
 [22] hipfunction(f::GPUArrays.var"#gpu_fill_kernel!#3", tt::Type{Tuple{…}}; kwargs::@Kwargs{})
    @ AMDGPU.Compiler ~/.julia/packages/AMDGPU/wH6SV/src/compiler/codegen.jl:155
 [23] hipfunction(f::GPUArrays.var"#gpu_fill_kernel!#3", tt::Type{Tuple{KernelAbstractions.CompilerMetadata{…}, AMDGPU.Device.ROCDeviceVector{…}, Float32}})
    @ AMDGPU.Compiler ~/.julia/packages/AMDGPU/wH6SV/src/compiler/codegen.jl:154
 [24] macro expansion
    @ ~/.julia/packages/AMDGPU/wH6SV/src/highlevel.jl:155 [inlined]
 [25] (::KernelAbstractions.Kernel{…})(::ROCArray{…}, ::Vararg{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
    @ AMDGPU.ROCKernels ~/.julia/packages/AMDGPU/wH6SV/src/ROCKernels.jl:91
 [26] fill!(A::ROCArray{Float32, 1, AMDGPU.Runtime.Mem.HIPBuffer}, x::Float32)
    @ GPUArrays ~/.julia/packages/GPUArrays/uiVyU/src/host/construction.jl:22
 [27] zeros
    @ ~/.julia/packages/AMDGPU/wH6SV/src/array.jl:245 [inlined]
 [28] zeros(dims::Int64)
    @ AMDGPU ~/.julia/packages/AMDGPU/wH6SV/src/array.jl:244
 [29] top-level scope
    @ REPL[2]:1
Some type information was truncated. Use `show(err)` to see complete types.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions