Add PTXFDivFastPass to lower fdiv fast to NVPTX approximate division by vchuravy · Pull Request #800 · JuliaGPU/GPUCompiler.jl

vchuravy · 2026-05-19T13:20:36Z

Overarching goal is to move the fast math handling from CUDA.jl to the GPUCompiler backend.

The LLVM NVPTX backend handles fdiv fast for Float32 (→ div.approx.ftz.f32)
but has no fast path for Float64. This IR-level pass covers both:

Float32: replaces fdiv with __nv_fast_fdividef (libdevice)
Float64: replaces fdiv with rcp.approx.ftz.d + Newton refinement,
matching CUDA.jl's inv_fast(::Float64) algorithm

The pass fires when the instruction carries the afn fast-math flag (set by
@fastmath) or when target.fastmath=true. It follows the NVVMReflectPass
pattern already in ptx.jl.

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

@fastmath

The LLVM NVPTX backend handles fdiv fast for Float32 (→ div.approx.ftz.f32) but has no fast path for Float64. This IR-level pass covers both: - Float32: replaces fdiv with __nv_fast_fdividef (libdevice) - Float64: replaces fdiv with rcp.approx.ftz.d + Newton refinement, matching CUDA.jl's inv_fast(::Float64) algorithm The pass fires when the instruction carries the afn fast-math flag (set by @fastmath) or when target.fastmath=true. It follows the NVVMReflectPass pattern already in ptx.jl. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PTXFDivFastPass to lower fdiv fast to NVPTX approximate division#800

Add PTXFDivFastPass to lower fdiv fast to NVPTX approximate division#800
vchuravy wants to merge 1 commit into
mainfrom
vc/ptx_fast_div

vchuravy commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vchuravy commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant