Skip to content

CUBLAS.getrs_strided_batched! throws MethodError #3033

@mattsignorelli

Description

@mattsignorelli

Describe the bug

In getrs_strided_batched!, a call is constructed to getrs_batched!, but gives a no method matching error:

return getrs_batched!(trans, n, nrhs, Aptrs, lda, pivotptr, Bptrs, ldb), B

To reproduce

The Minimal Working Example (MWE) for this bug:

using CUDA

A = CuArray(reshape(collect(1.0:8.0), (2,2,2)))
b = CUDA.rand(2,1,2)
pivot = CUDA.zeros(Int32, 2, 2)
info = CUDA.zeros(Int32, 2)

CUBLAS.getrf_strided_batched!(A, pivot, info) # This is fine
CUBLAS.getrs_strided_batched!('N', A, b, pivot) # Error

Gives the error

MethodError: no method matching getrs_batched!(::Char, ::Int64, ::Int64, ::CuArray{CuPtr{Float64}, 1, CUDA.DeviceMemory}, ::Int64, ::CuPtr{Int32}, ::CuArray{CuPtr{Float32}, 1, CUDA.DeviceMemory}, ::Int64)

Closest candidates are:
  getrs_batched!(::Char, ::Any, ::Any, ::CuArray{CuPtr{Float32}, 1}, ::Any, ::CuPtr, ::CuArray{CuPtr{Float32}, 1}, ::Any)
   @ CUDA [~/.julia/packages/CUDA/FJf6p/lib/cublas/wrappers.jl:2199](https://jupyter.nersc.gov/user/mgsig21/perlmutter-exclusive-node-gpu/lab/tree/global/homes/m/mgsig21/~/.julia/packages/CUDA/FJf6p/lib/cublas/wrappers.jl#line=2198)
  getrs_batched!(::Char, ::Any, ::Any, ::CuArray{CuPtr{Float64}, 1}, ::Any, ::CuPtr, ::CuArray{CuPtr{Float64}, 1}, ::Any)
   @ CUDA [~/.julia/packages/CUDA/FJf6p/lib/cublas/wrappers.jl:2199](https://jupyter.nersc.gov/user/mgsig21/perlmutter-exclusive-node-gpu/lab/tree/global/homes/m/mgsig21/~/.julia/packages/CUDA/FJf6p/lib/cublas/wrappers.jl#line=2198)
  getrs_batched!(::Char, ::Any, ::Any, ::CuArray{CuPtr{ComplexF32}, 1}, ::Any, ::CuPtr, ::CuArray{CuPtr{ComplexF32}, 1}, ::Any)
   @ CUDA [~/.julia/packages/CUDA/FJf6p/lib/cublas/wrappers.jl:2199](https://jupyter.nersc.gov/user/mgsig21/perlmutter-exclusive-node-gpu/lab/tree/global/homes/m/mgsig21/~/.julia/packages/CUDA/FJf6p/lib/cublas/wrappers.jl#line=2198)
  ...


Stacktrace:
 [1] getrs_strided_batched!(trans::Char, A::CuArray{Float64, 3, CUDA.DeviceMemory}, B::CuArray{Float32, 3, CUDA.DeviceMemory}, pivotArray::CuArray{Int32, 2, CUDA.DeviceMemory})
   @ CUDA.CUBLAS [~/.julia/packages/CUDA/FJf6p/lib/cublas/wrappers.jl:2266](https://jupyter.nersc.gov/user/mgsig21/perlmutter-exclusive-node-gpu/lab/tree/global/homes/m/mgsig21/~/.julia/packages/CUDA/FJf6p/lib/cublas/wrappers.jl#line=2265)
 [2] top-level scope
   @ In[66]:7
Manifest.toml

See attached

Expected behavior

The method call needs to be corrected

Version info

Details on Julia:

Julia Version 1.10.9
Commit 5595d20a287 (2025-03-10 12:51 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 128 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 128 virtual cores)
Environment:
  LD_LIBRARY_PATH = /global/common/software/nersc9/darshan/default/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/25.5/math_libs/12.9/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/25.5/cuda/12.9/extras/CUPTI/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/25.5/cuda/12.9/extras/Debugger/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/25.5/cuda/12.9/nvvm/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/25.5/cuda/12.9/lib64:/opt/cray/pe/papi/7.2.0.2/lib64:/opt/cray/libfabric/1.22.0/lib64:/opt/cray/libfabric/default/lib64

Details on CUDA:

# please post the output of:
CUDA toolchain: 
- runtime 13.0, artifact installation
 for 13.1580.105.8
- compiler 13.1

CUDA libraries: 
- CUBLAS: 13.1.0
- CURAND: 10.4.0
12.0.0T: 
- CUSOLVER: 12.0.4
- CUSPARSE: 12.6.3
- CUPTI: 2025.3.1 (API 13.0.1)
.105.8: 13.0.0+580

Julia packages: 
- CUDA: 5.9.6
- GPUArrays: 11.3.4
- GPUCompiler: 1.8.2
- KernelAbstractions: 0.9.39
- CUDA_Driver_jll: 13.1.0+2
- CUDA_Compiler_jll: 0.4.1+1
- CUDA_Runtime_jll: 0.19.2+0

Toolchain:
- Julia: 1.10.9
- LLVM: 15.0.7

4 devices:
  0: NVIDIA A100-SXM4-40GB (sm_80, 38.976 GiB / 40.000 GiB available)
  1: NVIDIA A100-SXM4-40GB (sm_80, 39.490 GiB / 40.000 GiB available)
  2: NVIDIA A100-SXM4-40GB (sm_80, 39.490 GiB / 40.000 GiB available)
  3: NVIDIA A100-SXM4-40GB (sm_80, 39.490 GiB / 40.000 GiB available)

Additional context

none

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcuda librariesStuff about CUDA library wrappers.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions