jitlayers: Use GlobalISel on AArch64 at -O0/-O1#60339
Conversation
d89fedf to
dc0539b
Compare
|
I'm more comfortable with this than fastisel |
|
Does this need a pkgeval? |
|
Yeah a pkgeval on aarch64 would be good. @maleadt is that still possible? |
|
I believe this was enabled at one point for ahead-of-time compilation but disabled because of miscompiles? I should probably report/fix the easy ones upstream... The atomic Float16 miscompile still happens on this version: Compare with IIRC you can get this to trigger during bootstrapping with -O1. |
|
yeah. we should definitely be reporting any bugs in global isel we're finding. IIUC Global Isel is intended to be the isel future (completely replacing SelectionDAG) so we want it to be as solid as possible. |
Not easily, no; it's a very manual process. |
|
should we also use it at O2/O3? iiuc, for aarch64 it's generally expected to produce equally good code |
|
@xal-0 I added a test based on your example. It fails locally. |
36855b7 to
e49df8d
Compare
|
You can probably ask Claude to turn this into an llc test case for an LLVM issue. |
|
llvm/llvm-project#171499 has been merged. Are there any other known bugs? |
|
Is this ready to go? |
|
This needs to wait for that fix to land in LLVM. If we wanted it backported there was some guidance here llvm/llvm-project#171499 (comment) |
|
Can we just add it to our LLVM branch? |
e49df8d to
9cb28b3
Compare
|
I don't think your patch in included in the build at the moment present in #59950. |
LLVM 21 replaced the `Stmt*` parameter in `SValBuilder::conjureSymbolVal` with `ConstCFGElementRef` (llvm/llvm-project#128251). Update all four call sites in GCChecker.cpp to use `C.getCFGElementRef()` instead of passing `Expr*` pointers directly. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
LLVM 21 (llvm/llvm-project#138092) aligns i128 to 16 bytes when passing on the x86-32 stack. However, GCC does not support __int128 on 32-bit x86 at all, so C libraries use struct { int64_t, int64_t } which only has 4-byte stack alignment. This mismatch caused segfaults on Linux and incorrect results on Windows for ccalls involving Int128. Fix by using [4 x i32] instead of i128 as the byval type for large integer arguments, which naturally has 4-byte alignment matching the C ABI. Also fix abi_win32.cpp to pass large primitive types like Int128 by reference (byval) instead of directly, avoiding the same alignment issue. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FastISel was disabled on AArch64 in 2015 (PR JuliaLang#13393) to fix issue JuliaLang#13321, but that issue was specifically about 32-bit ARM (ARMv7) segfaults during bootstrap. The AArch64 exclusion was added conservatively alongside the ARM fix. AArch64 FastISel has been actively maintained upstream with recent bug fixes: - llvm/llvm-project#75993 (Jan 2024) - llvm/llvm-project#133987 (May 2025) This enables faster instruction selection for JIT compilation on AArch64 at lower optimization levels, reducing compilation latency.
GlobalISel is LLVM's modern instruction selector that is designed to replace both FastISel and SelectionDAG. On AArch64, it is mature and enabled by default at -O0 in upstream LLVM. This enables GlobalISel with fallback mode on AArch64, which provides faster instruction selection than SelectionDAG while maintaining correctness by falling back to SelectionDAG for unsupported patterns. Note: This requires RemoveJuliaAddrspacesPass to run before codegen, which is already the case in the current pipeline (see pipeline.cpp comment about GlobalISel not liking Julia's address spaces). Co-Authored-By: Claude <noreply@anthropic.com>
360eafb to
4b9356c
Compare
Add JL_GC_PROMISE_ROOTED annotations in egal_set::insert() for val, list, and keyset. These class members are rooted by the caller via JL_GC_PUSH3, but the GC checker cannot track rooting through class member indirection. The post-put_key annotation on list is needed because the returned pointer is a new value assigned to the already- pushed member address. Suppress optin.core.FixedAddressDereference false positives in llvm-alloc-opt.cpp (LLVM ilist_node_base.h internals) and gc-stock.c (gc_read_stack bounds-checked address) — this checker is new in LLVM 21. Co-authored-by: Claude <noreply@anthropic.com>
4b9356c to
50697f8
Compare
$ julia +nightly -e 'using InteractiveUtils; code_native(setindex!, (Threads.Atomic{Float16}, Float16); debuginfo=:none)'
.file "setindex!"
.text
.globl "julia_setindex!_0" // -- Begin function julia_setindex!_0
.p2align 4
.type "julia_setindex!_0",@function
"julia_setindex!_0": // @"julia_setindex!_0"
; Function Signature: setindex!(Base.Threads.Atomic{Float16}, Float16)
// %bb.0: // %top
//DEBUG_VALUE: setindex!:x <- [$x0+0]
//DEBUG_VALUE: setindex!:v <- $h0
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
// kill: def $h0 killed $h0 def $s0
//DEBUG_VALUE: setindex!:x <- [$x0+0]
//DEBUG_VALUE: setindex!:v <- $h0
fmov w8, s0
mov x29, sp
stlrh w8, [x0]
ldp x29, x30, [sp], #16 // 16-byte Folded Reload
ret
.Lfunc_end0:
.size "julia_setindex!_0", .Lfunc_end0-"julia_setindex!_0"
// -- End function
.section ".note.GNU-stack","",@progbits
$ julia +pr60339 -O1 -e 'using InteractiveUtils; code_native(setindex!, (Threads.Atomic{Float16}, Float16); debuginfo=:none)'
.file "setindex!"
.text
.globl "julia_setindex!_0" // -- Begin function julia_setindex!_0
.p2align 4
.type "julia_setindex!_0",@function
"julia_setindex!_0": // @"julia_setindex!_0"
; Function Signature: setindex!(Base.Threads.Atomic{Float16}, Float16)
// %bb.0: // %top
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
// kill: def $h0 killed $h0 def $s0
//DEBUG_VALUE: setindex!:x <- [$x0+0]
//DEBUG_VALUE: setindex!:v <- $h0
fmov w8, s0
mov x29, sp
stlrh w8, [x0]
ldp x29, x30, [sp], #16 // 16-byte Folded Reload
ret
.Lfunc_end0:
.size "julia_setindex!_0", .Lfunc_end0-"julia_setindex!_0"
// -- End function
.section ".note.GNU-stack","",@progbits
$ julia +pr60339 -O0 -e 'using InteractiveUtils; code_native(setindex!, (Threads.Atomic{Float16}, Float16); debuginfo=:none)'
.file "setindex!"
.text
.globl "julia_setindex!_0" // -- Begin function julia_setindex!_0
.p2align 4
.type "julia_setindex!_0",@function
"julia_setindex!_0": // @"julia_setindex!_0"
; Function Signature: setindex!(Base.Threads.Atomic{Float16}, Float16)
// %bb.0: // %top
//DEBUG_VALUE: setindex!:x <- [$x0+0]
//DEBUG_VALUE: setindex!:v <- $h0
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
mov x29, sp
// kill: def $s0 killed $h0
fmov w8, s0
stlrh w8, [x0]
ldp x29, x30, [sp], #16 // 16-byte Folded Reload
ret
.Lfunc_end0:
.size "julia_setindex!_0", .Lfunc_end0-"julia_setindex!_0"
// -- End function
.section ".note.GNU-stack","",@progbits@xal-0 looks good? |
JuliaLang#60339 was auto-closed when JuliaLang#59950 merged and can't be reopened. Co-authored-by: Claude <noreply@anthropic.com>
On-top of #59950
Required llvm/llvm-project#171499 which is backported in #59950
Alternative to #60338
GlobalISel is LLVM's modern instruction selector that is designed to replace
both FastISel and SelectionDAG. On AArch64, it is mature and enabled by default
at -O0 in upstream LLVM.
This enables GlobalISel with fallback mode on AArch64, which provides faster
instruction selection than SelectionDAG while maintaining correctness by
falling back to SelectionDAG for unsupported patterns.
Note: This requires RemoveJuliaAddrspacesPass to run before codegen, which is
already the case in the current pipeline (see pipeline.cpp comment about
GlobalISel not liking Julia's address spaces).
Co-authored-by: Claude