Skip to content

[BUG] cupynumeric.linalg.solve can NOT solve(raise some errors) the matrix when the size of matrix >= 2048 in multi gpus situation. #1253

@fixJ

Description

@fixJ

Software versions

System info:
Python : 3.12.12 | packaged by Anaconda, Inc. | (main, Oct 21 2025, 20:16:04) [GCC 11.2.0]
Platform : Linux-5.14.0-570.39.1.el9_6.x86_64-x86_64-with-glibc2.34
GPU driver : 580.82.09
GPU devices :
GPU 0 : NVIDIA GeForce RTX 2080 Ti
GPU 1 : NVIDIA GeForce RTX 2080 Ti
GPU 2 : NVIDIA GeForce RTX 2080 Ti
GPU 3 : NVIDIA GeForce RTX 2080 Ti
GPU 4 : NVIDIA GeForce RTX 2080 Ti
GPU 5 : NVIDIA GeForce RTX 2080 Ti

Package versions:
legion : legion-25.12.0-22-g4679528b4 (commit: 4679528b4cd3f71a9cebc642e39a1c0f074c717a)
legate : 26.01.00
cupynumeric : 26.01.00
numpy : 1.26.4
scipy : 1.16.3
numba : (failed to detect)

Legate build configuration:
build_type : Release
use_openmp : True
use_cuda : True
networks : ucx
conduit :
configure_options : --LEGATE_ARCH=arch-conda;--with-python;--with-cc=/tmp/conda-croot/legate/_build_env/bin/x86_64-conda-linux-gnu-cc;--with-cxx=/tmp/conda-croot/legate/_build_env/bin/x86_64-conda-linux-gnu-c++;--build-march=haswell;--cmake-generator=Ninja;--with-openmp;--with-cuda;--build-type=release;--with-ucx

Package details:
cuda-version : cuda-version-13.1-h2ff5cdb_3 (conda-forge)
legate : legate-26.01.00-cuda13_py312_ucx_gpu_g3ccb63960_0 (legate)
cupynumeric : cupynumeric-26.01.00-cuda13_py312_gpu_gae1c7878_0 (legate)

Jupyter notebook / Jupyter Lab version

No response

Expected behavior

  1. the minimal reproducer code:
import cupynumeric as cp

if __name__ == '__main__':
    n = 2048
    A = cp.random.rand(n, n)
    b = cp.random.rand(n, n)
    cp.linalg.solve(A, b)

Expect successful operation under normal conditions.

Observed behavior

  1. errors:
legate --gpus 2 test.py

[error1.txt](https://github.com/user-attachments/files/25037341/error1.txt)
also in comment
  1. and then I use compute-sanitizer tool to analyse:
[error2.txt](https://github.com/user-attachments/files/25037457/error2.txt)
also in comment
  1. But, if set the size of matrix < 2048, such as 2047, the code will run successfully, and regardless of the counts of GPUs
  2. Let me summarize the test to a table, which you can see the issue clearly:
size of matrix count of gpu run result
size < 2048 1 ok
size >= 2048 1 ok
size < 2048 >=2 ok
size >= 2048 >= 2 failed

Example code or instructions

the minimal reproducer code:

import cupynumeric as cp

if __name__ == '__main__':
    n = 2048
    A = cp.random.rand(n, n)
    b = cp.random.rand(n, n)
    cp.linalg.solve(A, b)

Stack traceback or browser console output

  1. and additonal infomation:
nvidia-smi topo -m
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PIX     PHB     PHB     SYS     SYS     0-7,16-23       0               N/A
GPU1    PIX      X      PHB     PHB     SYS     SYS     0-7,16-23       0               N/A
GPU2    PHB     PHB      X      PIX     SYS     SYS     0-7,16-23       0               N/A
GPU3    PHB     PHB     PIX      X      SYS     SYS     0-7,16-23       0               N/A
GPU4    SYS     SYS     SYS     SYS      X      PIX     8-15,24-31      1               N/A
GPU5    SYS     SYS     SYS     SYS     PIX      X      8-15,24-31      1               N/A
the nvidia official sample code from: https://github.com/NVIDIA/cuda-samples/blob/v12.9/Samples/0_Introduction/simpleP2P/simpleP2P.cu

[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 6

Checking GPU(s) for support of peer to peer memory access...
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU0) -> NVIDIA GeForce RTX 2080 Ti (GPU1) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU0) -> NVIDIA GeForce RTX 2080 Ti (GPU2) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU0) -> NVIDIA GeForce RTX 2080 Ti (GPU3) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU0) -> NVIDIA GeForce RTX 2080 Ti (GPU4) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU0) -> NVIDIA GeForce RTX 2080 Ti (GPU5) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU1) -> NVIDIA GeForce RTX 2080 Ti (GPU0) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU1) -> NVIDIA GeForce RTX 2080 Ti (GPU2) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU1) -> NVIDIA GeForce RTX 2080 Ti (GPU3) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU1) -> NVIDIA GeForce RTX 2080 Ti (GPU4) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU1) -> NVIDIA GeForce RTX 2080 Ti (GPU5) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU2) -> NVIDIA GeForce RTX 2080 Ti (GPU0) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU2) -> NVIDIA GeForce RTX 2080 Ti (GPU1) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU2) -> NVIDIA GeForce RTX 2080 Ti (GPU3) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU2) -> NVIDIA GeForce RTX 2080 Ti (GPU4) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU2) -> NVIDIA GeForce RTX 2080 Ti (GPU5) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU3) -> NVIDIA GeForce RTX 2080 Ti (GPU0) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU3) -> NVIDIA GeForce RTX 2080 Ti (GPU1) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU3) -> NVIDIA GeForce RTX 2080 Ti (GPU2) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU3) -> NVIDIA GeForce RTX 2080 Ti (GPU4) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU3) -> NVIDIA GeForce RTX 2080 Ti (GPU5) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU4) -> NVIDIA GeForce RTX 2080 Ti (GPU0) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU4) -> NVIDIA GeForce RTX 2080 Ti (GPU1) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU4) -> NVIDIA GeForce RTX 2080 Ti (GPU2) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU4) -> NVIDIA GeForce RTX 2080 Ti (GPU3) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU4) -> NVIDIA GeForce RTX 2080 Ti (GPU5) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU5) -> NVIDIA GeForce RTX 2080 Ti (GPU0) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU5) -> NVIDIA GeForce RTX 2080 Ti (GPU1) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU5) -> NVIDIA GeForce RTX 2080 Ti (GPU2) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU5) -> NVIDIA GeForce RTX 2080 Ti (GPU3) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU5) -> NVIDIA GeForce RTX 2080 Ti (GPU4) : No
Two or more GPUs with Peer-to-Peer access capability are required for ./simpleP2P.
Peer to Peer access is not available amongst GPUs in the system, waiving test.
conda list| grep -E "nccl|cu"

cuda-cccl_linux-64             13.1.115         ha770c72_0                         conda-forge
cuda-cudart                    13.1.80          hecca717_0                         conda-forge
cuda-cudart-dev_linux-64       13.1.80          h376f20c_0                         conda-forge
cuda-cudart-static_linux-64    13.1.80          h376f20c_0                         conda-forge
cuda-cudart_linux-64           13.1.80          h376f20c_0                         conda-forge
cuda-nvrtc                     13.1.115         hecca717_0                         conda-forge
cuda-nvtx                      13.1.115         hecca717_0                         conda-forge
cuda-version                   13.1             h2ff5cdb_3                         conda-forge
cupy                           13.6.0           py312h045ee1a_2                    conda-forge
cupy-core                      13.6.0           py312h1a70bb2_2                    conda-forge
cupynumeric                    26.01.00         cuda13_py312_gpu_gae1c7878_0       legate
cutensor                       2.3.1.0          h15eaa2f_1                         conda-forge
icu                            73.1             h6a678d5_0
legate                         26.01.00         cuda13_py312_ucx_gpu_g3ccb63960_0  legate
libcublas                      13.2.1.1         h676940d_0                         conda-forge
libcufft                       12.1.0.78        hecca717_0                         conda-forge
libcufile                      1.16.1.26        hd07211c_0                         conda-forge
libcups                        2.4.15           hbe4054b_0
libcurand                      10.4.1.81        h676940d_0                         conda-forge
libcurl                        8.18.0           h4e3cde8_0                         conda-forge
libcusolver                    12.0.9.81        h676940d_0                         conda-forge
libcusolvermp0                 0.7.2.888        h7bcfba5_3                         conda-forge
libcusparse                    12.7.3.1         hecca717_0                         conda-forge
nccl                           2.28.9.1         hd557bf5_1                         conda-forge
ncurses                        6.5              h7934f7d_0
xcb-util-cursor                0.1.5            h5eee18b_0

Could you give me some advice to solve this issue? and please let me know if you need any further information. Thanks a lot.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions