Describe the bug
In getrs_strided_batched!, a call is constructed to getrs_batched!, but gives a no method matching error:
|
return getrs_batched!(trans, n, nrhs, Aptrs, lda, pivotptr, Bptrs, ldb), B |
To reproduce
The Minimal Working Example (MWE) for this bug:
using CUDA
A = CuArray(reshape(collect(1.0:8.0), (2,2,2)))
b = CUDA.rand(2,1,2)
pivot = CUDA.zeros(Int32, 2, 2)
info = CUDA.zeros(Int32, 2)
CUBLAS.getrf_strided_batched!(A, pivot, info) # This is fine
CUBLAS.getrs_strided_batched!('N', A, b, pivot) # Error
Gives the error
MethodError: no method matching getrs_batched!(::Char, ::Int64, ::Int64, ::CuArray{CuPtr{Float64}, 1, CUDA.DeviceMemory}, ::Int64, ::CuPtr{Int32}, ::CuArray{CuPtr{Float32}, 1, CUDA.DeviceMemory}, ::Int64)
Closest candidates are:
getrs_batched!(::Char, ::Any, ::Any, ::CuArray{CuPtr{Float32}, 1}, ::Any, ::CuPtr, ::CuArray{CuPtr{Float32}, 1}, ::Any)
@ CUDA [~/.julia/packages/CUDA/FJf6p/lib/cublas/wrappers.jl:2199](https://jupyter.nersc.gov/user/mgsig21/perlmutter-exclusive-node-gpu/lab/tree/global/homes/m/mgsig21/~/.julia/packages/CUDA/FJf6p/lib/cublas/wrappers.jl#line=2198)
getrs_batched!(::Char, ::Any, ::Any, ::CuArray{CuPtr{Float64}, 1}, ::Any, ::CuPtr, ::CuArray{CuPtr{Float64}, 1}, ::Any)
@ CUDA [~/.julia/packages/CUDA/FJf6p/lib/cublas/wrappers.jl:2199](https://jupyter.nersc.gov/user/mgsig21/perlmutter-exclusive-node-gpu/lab/tree/global/homes/m/mgsig21/~/.julia/packages/CUDA/FJf6p/lib/cublas/wrappers.jl#line=2198)
getrs_batched!(::Char, ::Any, ::Any, ::CuArray{CuPtr{ComplexF32}, 1}, ::Any, ::CuPtr, ::CuArray{CuPtr{ComplexF32}, 1}, ::Any)
@ CUDA [~/.julia/packages/CUDA/FJf6p/lib/cublas/wrappers.jl:2199](https://jupyter.nersc.gov/user/mgsig21/perlmutter-exclusive-node-gpu/lab/tree/global/homes/m/mgsig21/~/.julia/packages/CUDA/FJf6p/lib/cublas/wrappers.jl#line=2198)
...
Stacktrace:
[1] getrs_strided_batched!(trans::Char, A::CuArray{Float64, 3, CUDA.DeviceMemory}, B::CuArray{Float32, 3, CUDA.DeviceMemory}, pivotArray::CuArray{Int32, 2, CUDA.DeviceMemory})
@ CUDA.CUBLAS [~/.julia/packages/CUDA/FJf6p/lib/cublas/wrappers.jl:2266](https://jupyter.nersc.gov/user/mgsig21/perlmutter-exclusive-node-gpu/lab/tree/global/homes/m/mgsig21/~/.julia/packages/CUDA/FJf6p/lib/cublas/wrappers.jl#line=2265)
[2] top-level scope
@ In[66]:7
Manifest.toml
See attached
Expected behavior
The method call needs to be corrected
Version info
Details on Julia:
Julia Version 1.10.9
Commit 5595d20a287 (2025-03-10 12:51 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 128 × AMD EPYC 7763 64-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 128 virtual cores)
Environment:
LD_LIBRARY_PATH = /global/common/software/nersc9/darshan/default/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/25.5/math_libs/12.9/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/25.5/cuda/12.9/extras/CUPTI/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/25.5/cuda/12.9/extras/Debugger/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/25.5/cuda/12.9/nvvm/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/25.5/cuda/12.9/lib64:/opt/cray/pe/papi/7.2.0.2/lib64:/opt/cray/libfabric/1.22.0/lib64:/opt/cray/libfabric/default/lib64
Details on CUDA:
# please post the output of:
CUDA toolchain:
- runtime 13.0, artifact installation
for 13.1580.105.8
- compiler 13.1
CUDA libraries:
- CUBLAS: 13.1.0
- CURAND: 10.4.0
12.0.0T:
- CUSOLVER: 12.0.4
- CUSPARSE: 12.6.3
- CUPTI: 2025.3.1 (API 13.0.1)
.105.8: 13.0.0+580
Julia packages:
- CUDA: 5.9.6
- GPUArrays: 11.3.4
- GPUCompiler: 1.8.2
- KernelAbstractions: 0.9.39
- CUDA_Driver_jll: 13.1.0+2
- CUDA_Compiler_jll: 0.4.1+1
- CUDA_Runtime_jll: 0.19.2+0
Toolchain:
- Julia: 1.10.9
- LLVM: 15.0.7
4 devices:
0: NVIDIA A100-SXM4-40GB (sm_80, 38.976 GiB / 40.000 GiB available)
1: NVIDIA A100-SXM4-40GB (sm_80, 39.490 GiB / 40.000 GiB available)
2: NVIDIA A100-SXM4-40GB (sm_80, 39.490 GiB / 40.000 GiB available)
3: NVIDIA A100-SXM4-40GB (sm_80, 39.490 GiB / 40.000 GiB available)
Additional context
none
Describe the bug
In
getrs_strided_batched!, a call is constructed togetrs_batched!, but gives a no method matching error:CUDA.jl/lib/cublas/wrappers.jl
Line 2266 in 7a27d77
To reproduce
The Minimal Working Example (MWE) for this bug:
Gives the error
Manifest.toml
See attached
Expected behavior
The method call needs to be corrected
Version info
Details on Julia:
Details on CUDA:
Additional context
none