-
Notifications
You must be signed in to change notification settings - Fork 88
Description
Software versions
System info:
Python : 3.12.12 | packaged by Anaconda, Inc. | (main, Oct 21 2025, 20:16:04) [GCC 11.2.0]
Platform : Linux-5.14.0-570.39.1.el9_6.x86_64-x86_64-with-glibc2.34
GPU driver : 580.82.09
GPU devices :
GPU 0 : NVIDIA GeForce RTX 2080 Ti
GPU 1 : NVIDIA GeForce RTX 2080 Ti
GPU 2 : NVIDIA GeForce RTX 2080 Ti
GPU 3 : NVIDIA GeForce RTX 2080 Ti
GPU 4 : NVIDIA GeForce RTX 2080 Ti
GPU 5 : NVIDIA GeForce RTX 2080 Ti
Package versions:
legion : legion-25.12.0-22-g4679528b4 (commit: 4679528b4cd3f71a9cebc642e39a1c0f074c717a)
legate : 26.01.00
cupynumeric : 26.01.00
numpy : 1.26.4
scipy : 1.16.3
numba : (failed to detect)
Legate build configuration:
build_type : Release
use_openmp : True
use_cuda : True
networks : ucx
conduit :
configure_options : --LEGATE_ARCH=arch-conda;--with-python;--with-cc=/tmp/conda-croot/legate/_build_env/bin/x86_64-conda-linux-gnu-cc;--with-cxx=/tmp/conda-croot/legate/_build_env/bin/x86_64-conda-linux-gnu-c++;--build-march=haswell;--cmake-generator=Ninja;--with-openmp;--with-cuda;--build-type=release;--with-ucx
Package details:
cuda-version : cuda-version-13.1-h2ff5cdb_3 (conda-forge)
legate : legate-26.01.00-cuda13_py312_ucx_gpu_g3ccb63960_0 (legate)
cupynumeric : cupynumeric-26.01.00-cuda13_py312_gpu_gae1c7878_0 (legate)
Jupyter notebook / Jupyter Lab version
No response
Expected behavior
- the minimal reproducer code:
import cupynumeric as cp
if __name__ == '__main__':
n = 2048
A = cp.random.rand(n, n)
b = cp.random.rand(n, n)
cp.linalg.solve(A, b)Expect successful operation under normal conditions.
Observed behavior
- errors:
legate --gpus 2 test.py
[error1.txt](https://github.com/user-attachments/files/25037341/error1.txt)
also in comment- and then I use compute-sanitizer tool to analyse:
[error2.txt](https://github.com/user-attachments/files/25037457/error2.txt)
also in comment- But, if set the size of matrix < 2048, such as 2047, the code will run successfully, and regardless of the counts of GPUs
- Let me summarize the test to a table, which you can see the issue clearly:
| size of matrix | count of gpu | run result |
|---|---|---|
| size < 2048 | 1 | ok |
| size >= 2048 | 1 | ok |
| size < 2048 | >=2 | ok |
| size >= 2048 | >= 2 | failed |
Example code or instructions
the minimal reproducer code:
import cupynumeric as cp
if __name__ == '__main__':
n = 2048
A = cp.random.rand(n, n)
b = cp.random.rand(n, n)
cp.linalg.solve(A, b)Stack traceback or browser console output
- and additonal infomation:
nvidia-smi topo -m
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X PIX PHB PHB SYS SYS 0-7,16-23 0 N/A
GPU1 PIX X PHB PHB SYS SYS 0-7,16-23 0 N/A
GPU2 PHB PHB X PIX SYS SYS 0-7,16-23 0 N/A
GPU3 PHB PHB PIX X SYS SYS 0-7,16-23 0 N/A
GPU4 SYS SYS SYS SYS X PIX 8-15,24-31 1 N/A
GPU5 SYS SYS SYS SYS PIX X 8-15,24-31 1 N/Athe nvidia official sample code from: https://github.com/NVIDIA/cuda-samples/blob/v12.9/Samples/0_Introduction/simpleP2P/simpleP2P.cu
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 6
Checking GPU(s) for support of peer to peer memory access...
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU0) -> NVIDIA GeForce RTX 2080 Ti (GPU1) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU0) -> NVIDIA GeForce RTX 2080 Ti (GPU2) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU0) -> NVIDIA GeForce RTX 2080 Ti (GPU3) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU0) -> NVIDIA GeForce RTX 2080 Ti (GPU4) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU0) -> NVIDIA GeForce RTX 2080 Ti (GPU5) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU1) -> NVIDIA GeForce RTX 2080 Ti (GPU0) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU1) -> NVIDIA GeForce RTX 2080 Ti (GPU2) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU1) -> NVIDIA GeForce RTX 2080 Ti (GPU3) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU1) -> NVIDIA GeForce RTX 2080 Ti (GPU4) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU1) -> NVIDIA GeForce RTX 2080 Ti (GPU5) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU2) -> NVIDIA GeForce RTX 2080 Ti (GPU0) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU2) -> NVIDIA GeForce RTX 2080 Ti (GPU1) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU2) -> NVIDIA GeForce RTX 2080 Ti (GPU3) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU2) -> NVIDIA GeForce RTX 2080 Ti (GPU4) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU2) -> NVIDIA GeForce RTX 2080 Ti (GPU5) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU3) -> NVIDIA GeForce RTX 2080 Ti (GPU0) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU3) -> NVIDIA GeForce RTX 2080 Ti (GPU1) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU3) -> NVIDIA GeForce RTX 2080 Ti (GPU2) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU3) -> NVIDIA GeForce RTX 2080 Ti (GPU4) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU3) -> NVIDIA GeForce RTX 2080 Ti (GPU5) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU4) -> NVIDIA GeForce RTX 2080 Ti (GPU0) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU4) -> NVIDIA GeForce RTX 2080 Ti (GPU1) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU4) -> NVIDIA GeForce RTX 2080 Ti (GPU2) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU4) -> NVIDIA GeForce RTX 2080 Ti (GPU3) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU4) -> NVIDIA GeForce RTX 2080 Ti (GPU5) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU5) -> NVIDIA GeForce RTX 2080 Ti (GPU0) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU5) -> NVIDIA GeForce RTX 2080 Ti (GPU1) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU5) -> NVIDIA GeForce RTX 2080 Ti (GPU2) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU5) -> NVIDIA GeForce RTX 2080 Ti (GPU3) : No
> Peer access from NVIDIA GeForce RTX 2080 Ti (GPU5) -> NVIDIA GeForce RTX 2080 Ti (GPU4) : No
Two or more GPUs with Peer-to-Peer access capability are required for ./simpleP2P.
Peer to Peer access is not available amongst GPUs in the system, waiving test.conda list| grep -E "nccl|cu"
cuda-cccl_linux-64 13.1.115 ha770c72_0 conda-forge
cuda-cudart 13.1.80 hecca717_0 conda-forge
cuda-cudart-dev_linux-64 13.1.80 h376f20c_0 conda-forge
cuda-cudart-static_linux-64 13.1.80 h376f20c_0 conda-forge
cuda-cudart_linux-64 13.1.80 h376f20c_0 conda-forge
cuda-nvrtc 13.1.115 hecca717_0 conda-forge
cuda-nvtx 13.1.115 hecca717_0 conda-forge
cuda-version 13.1 h2ff5cdb_3 conda-forge
cupy 13.6.0 py312h045ee1a_2 conda-forge
cupy-core 13.6.0 py312h1a70bb2_2 conda-forge
cupynumeric 26.01.00 cuda13_py312_gpu_gae1c7878_0 legate
cutensor 2.3.1.0 h15eaa2f_1 conda-forge
icu 73.1 h6a678d5_0
legate 26.01.00 cuda13_py312_ucx_gpu_g3ccb63960_0 legate
libcublas 13.2.1.1 h676940d_0 conda-forge
libcufft 12.1.0.78 hecca717_0 conda-forge
libcufile 1.16.1.26 hd07211c_0 conda-forge
libcups 2.4.15 hbe4054b_0
libcurand 10.4.1.81 h676940d_0 conda-forge
libcurl 8.18.0 h4e3cde8_0 conda-forge
libcusolver 12.0.9.81 h676940d_0 conda-forge
libcusolvermp0 0.7.2.888 h7bcfba5_3 conda-forge
libcusparse 12.7.3.1 hecca717_0 conda-forge
nccl 2.28.9.1 hd557bf5_1 conda-forge
ncurses 6.5 h7934f7d_0
xcb-util-cursor 0.1.5 h5eee18b_0Could you give me some advice to solve this issue? and please let me know if you need any further information. Thanks a lot.