I am helping a NERSC user develop a package on Perlmutter that depends on CUDA. We're encountering the following problem where the Pkg.test() environment does not pick up the project-wide CUDA configuration.
Background
At NERSC we set the JULIA_LOAD_PATH to :/global/common/software/nersc/n9/julia/environments/1.10.4/gnu (or similar) which contains the following LocalPreferences.toml:
# MPI stuff committed for brevity
[CUDA_Runtime_jll]
local = "true"
version = "12.2"
This way we set the CUDA.jl runtime version globally on the system to match the version installed by the vendor:
$ julia --project=@. -e "import CUDA; CUDA.versioninfo()"
CUDA runtime 12.2, local installation
CUDA driver 12.6
NVIDIA driver 535.216.1, originally for CUDA 12.2
CUDA libraries:
- CUBLAS: 12.2.1
- CURAND: 10.3.3
- CUFFT: 11.0.8
- CUSOLVER: 11.5.0
- CUSPARSE: 12.1.1
- CUPTI: 2023.2.0 (API 20.0.0)
- NVML: 12.0.0+535.216.1
Julia packages:
- CUDA: 5.4.3
- CUDA_Driver_jll: 0.9.2+0
- CUDA_Runtime_jll: 0.14.1+0
- CUDA_Runtime_Discovery: 0.3.5
Toolchain:
- Julia: 1.10.4
- LLVM: 15.0.7
Preferences:
- CUDA_Runtime_jll.version: 12.2
- CUDA_Runtime_jll.local: true
1 device:
0: NVIDIA A100-PCIE-40GB (sm_80, 39.391 GiB / 40.000 GiB available)
The Problem
The package the user is developing uses CUDA. If I add CUDA.versioninfo() to the unit tests, and run:
$ julia --project=@. -e "import Pkg; Pkg.test()"
# Temporary env setup omitted for brevity
Testing Running tests...
┌ Warning: CUDA runtime library `libcublasLt.so.12` was loaded from a system path, `/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/lib64/libcublasLt.so.12`.
│
│ This may cause errors. Ensure that you have not set the LD_LIBRARY_PATH
│ environment variable, or that it does not contain paths to CUDA libraries.
│
│ In any other case, please file an issue.
└ @ CUDA ~/.julia/packages/CUDA/Tl08O/src/initialization.jl:219
┌ Warning: CUDA runtime library `libnvJitLink.so.12` was loaded from a system path, `/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/lib64/libnvJitLink.so.12`.
│
│ This may cause errors. Ensure that you have not set the LD_LIBRARY_PATH
│ environment variable, or that it does not contain paths to CUDA libraries.
│
│ In any other case, please file an issue.
└ @ CUDA ~/.julia/packages/CUDA/Tl08O/src/initialization.jl:219
┌ Warning: CUDA runtime library `libcusparse.so.12` was loaded from a system path, `/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/lib64/libcusparse.so.12`.
│
│ This may cause errors. Ensure that you have not set the LD_LIBRARY_PATH
│ environment variable, or that it does not contain paths to CUDA libraries.
│
│ In any other case, please file an issue.
└ @ CUDA ~/.julia/packages/CUDA/Tl08O/src/initialization.jl:219
CUDA runtime 12.5, artifact installation
CUDA driver 12.6
NVIDIA driver 535.216.1, originally for CUDA 12.2
CUDA libraries:
- CUBLAS: 12.2.1
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.3
- CUSPARSE: 12.5.1
- CUPTI: 2024.2.1 (API 23.0.0)
- NVML: 12.0.0+535.216.1
Julia packages:
- CUDA: 5.4.3
- CUDA_Driver_jll: 0.9.2+0
- CUDA_Runtime_jll: 0.14.1+0
Toolchain:
- Julia: 1.10.4
- LLVM: 15.0.7
1 device:
0: NVIDIA A100-PCIE-40GB (sm_80, 39.391 GiB / 40.000 GiB available)
So then I tried adding the LocalPreferences.toml to the Pkg.test environment -- as well as adding:
[preferences.CUDA_Runtime_jll]
local = "true"
version = "12.2"
to the test's Project.toml. Neither worked.
How do I either force CUDA.jl to use the system-wide preferences, or how to tell a unit test that relies on CUDA.jl to use a specific runtime version?
I was going to post this in the official Pkg.jl repo, but wanted to get @maleadt 's opinion first.
I am helping a NERSC user develop a package on Perlmutter that depends on CUDA. We're encountering the following problem where the Pkg.test() environment does not pick up the project-wide CUDA configuration.
Background
At NERSC we set the
JULIA_LOAD_PATHto:/global/common/software/nersc/n9/julia/environments/1.10.4/gnu(or similar) which contains the followingLocalPreferences.toml:This way we set the CUDA.jl runtime version globally on the system to match the version installed by the vendor:
The Problem
The package the user is developing uses CUDA. If I add
CUDA.versioninfo()to the unit tests, and run:So then I tried adding the
LocalPreferences.tomlto the Pkg.test environment -- as well as adding:to the test's
Project.toml. Neither worked.How do I either force CUDA.jl to use the system-wide preferences, or how to tell a unit test that relies on CUDA.jl to use a specific runtime version?
I was going to post this in the official Pkg.jl repo, but wanted to get @maleadt 's opinion first.