I encountered a problem when executing the commands cd quantkernel and python setup.py install.
(qcom) zhuyw@ubuntu-Z11PA-U12-Series:/qcom/QComplier$ cd quantkernel
(qcom) zhuyw@ubuntu-Z11PA-U12-Series:/qcom/QComplier/quantkernel$ python setup.py install
No CUDA runtime is found, using CUDA_HOME='/usr'
['/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/nvidia/cuda_runtime/include', '/home/zhuyw/qcom/QComplier/quantkernel', '/home/zhuyw/qcom/QComplier/quantkernel/common', 'cutlass/include']
running install
/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:79: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!
********************************************************************************
Please avoid running ``setup.py`` directly.
Instead, use pypa/build, pypa/installer or other
standards-based tools.
See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
********************************************************************************
!!
self.initialize_options()
/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:79: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!
********************************************************************************
Please avoid running ``setup.py`` and ``easy_install``.
Instead, use pypa/build, pypa/installer or other
standards-based tools.
See https://github.com/pypa/setuptools/issues/917 for details.
********************************************************************************
!!
self.initialize_options()
running bdist_egg
running egg_info
creating mixlib.egg-info
writing mixlib.egg-info/PKG-INFO
writing dependency_links to mixlib.egg-info/dependency_links.txt
writing top-level names to mixlib.egg-info/top_level.txt
writing manifest file 'mixlib.egg-info/SOURCES.txt'
/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/torch/utils/cpp_extension.py:497: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'mixlib.egg-info/SOURCES.txt'
writing manifest file 'mixlib.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/lib.linux-x86_64-cpython-310/cuda_setup
copying cuda_setup/init.py -> build/lib.linux-x86_64-cpython-310/cuda_setup
copying cuda_setup/main.py -> build/lib.linux-x86_64-cpython-310/cuda_setup
copying cuda_setup/env_vars.py -> build/lib.linux-x86_64-cpython-310/cuda_setup
running build_ext
/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/torch/utils/cpp_extension.py:416: UserWarning: The detected CUDA version (12.0) has a minor version mismatch with the version that was used to compile PyTorch (12.1). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/torch/utils/cpp_extension.py:426: UserWarning: There are no g++ version bounds defined for CUDA version 12.0
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
building 'mixlib' extension
creating build/temp.linux-x86_64-cpython-310/mix_cuda
creating build/temp.linux-x86_64-cpython-310/mix_cuda/layernorm
/usr/bin/nvcc -I/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/torch/include -I/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/torch/include/TH -I/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/torch/include/THC -I/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/nvidia/cuda_runtime/include -I/home/zhuyw/qcom/QComplier/quantkernel -I/home/zhuyw/qcom/QComplier/quantkernel/common -Icutlass/include -I/home/zhuyw/anaconda3/envs/qcom/include/python3.10 -c mix_cuda/cult.cu -o build/temp.linux-x86_64-cpython-310/mix_cuda/cult.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CPP -gencode arch=compute_80,code=sm_80 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=mixlib -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin g++
mix_cuda/cult.cu(2212): error: more than one user-defined conversion from "c10::Half" to "__half" applies:
function "__half::__half(float)"
/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/nvidia/cuda_runtime/include/cuda_fp16.hpp(214): here
function "__half::__half(__half &&) noexcept" (declared implicitly)
mix_cuda/cult.cu(3018): warning #2361-D: invalid narrowing conversion from "long" to "unsigned int"
Remark: The warnings can be suppressed with "-diag-suppress "
1 error detected in the compilation of "mix_cuda/cult.cu".
error: command '/usr/bin/nvcc' failed with exit code 2
my visual envrionment as follow
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
bzip2 1.0.8 h5eee18b_6
ca-certificates 2025.2.25 h06a4308_0
eetq 1.0.0 pypi_0 pypi
filelock 3.13.1 pypi_0 pypi
fsspec 2024.6.1 pypi_0 pypi
jinja2 3.1.4 pypi_0 pypi
ld_impl_linux-64 2.40 h12ee557_0
libffi 3.4.4 h6a678d5_1
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
libuuid 1.41.5 h5eee18b_0
markupsafe 2.1.5 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0
networkx 3.3 pypi_0 pypi
ninja 1.11.1.4 pypi_0 pypi
numpy 2.1.2 pypi_0 pypi
nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi
nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi
nvidia-curand-cu12 10.3.2.106 pypi_0 pypi
nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi
nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi
nvidia-nccl-cu12 2.21.5 pypi_0 pypi
nvidia-nvjitlink-cu12 12.1.105 pypi_0 pypi
nvidia-nvtx-cu12 12.1.105 pypi_0 pypi
openssl 3.0.16 h5eee18b_0
packaging 25.0 pypi_0 pypi
pillow 11.0.0 pypi_0 pypi
pip 25.0 py310h06a4308_0
python 3.10.16 he870216_1
readline 8.2 h5eee18b_0
setuptools 75.8.0 py310h06a4308_0
sqlite 3.45.3 h5eee18b_0
sympy 1.13.1 pypi_0 pypi
tk 8.6.14 h39e8969_0
torch 2.5.1+cu121 pypi_0 pypi
torchaudio 2.5.1+cu121 pypi_0 pypi
torchvision 0.20.1+cu121 pypi_0 pypi
triton 3.1.0 pypi_0 pypi
typing-extensions 4.12.2 pypi_0 pypi
tzdata 2025a h04d1e81_0
wheel 0.45.1 py310h06a4308_0
xz 5.6.4 h5eee18b_1
zlib 1.2.13 h5eee18b_1
I encountered a problem when executing the commands cd quantkernel and python setup.py install.
(qcom) zhuyw@ubuntu-Z11PA-U12-Series:
/qcom/QComplier$ cd quantkernel/qcom/QComplier/quantkernel$ python setup.py install(qcom) zhuyw@ubuntu-Z11PA-U12-Series:
No CUDA runtime is found, using CUDA_HOME='/usr'
['/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/nvidia/cuda_runtime/include', '/home/zhuyw/qcom/QComplier/quantkernel', '/home/zhuyw/qcom/QComplier/quantkernel/common', 'cutlass/include']
running install
/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:79: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!
!!
self.initialize_options()
/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:79: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!
!!
self.initialize_options()
running bdist_egg
running egg_info
creating mixlib.egg-info
writing mixlib.egg-info/PKG-INFO
writing dependency_links to mixlib.egg-info/dependency_links.txt
writing top-level names to mixlib.egg-info/top_level.txt
writing manifest file 'mixlib.egg-info/SOURCES.txt'
/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/torch/utils/cpp_extension.py:497: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'mixlib.egg-info/SOURCES.txt'
writing manifest file 'mixlib.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/lib.linux-x86_64-cpython-310/cuda_setup
copying cuda_setup/init.py -> build/lib.linux-x86_64-cpython-310/cuda_setup
copying cuda_setup/main.py -> build/lib.linux-x86_64-cpython-310/cuda_setup
copying cuda_setup/env_vars.py -> build/lib.linux-x86_64-cpython-310/cuda_setup
running build_ext
/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/torch/utils/cpp_extension.py:416: UserWarning: The detected CUDA version (12.0) has a minor version mismatch with the version that was used to compile PyTorch (12.1). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/torch/utils/cpp_extension.py:426: UserWarning: There are no g++ version bounds defined for CUDA version 12.0
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
building 'mixlib' extension
creating build/temp.linux-x86_64-cpython-310/mix_cuda
creating build/temp.linux-x86_64-cpython-310/mix_cuda/layernorm
/usr/bin/nvcc -I/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/torch/include -I/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/torch/include/TH -I/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/torch/include/THC -I/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/nvidia/cuda_runtime/include -I/home/zhuyw/qcom/QComplier/quantkernel -I/home/zhuyw/qcom/QComplier/quantkernel/common -Icutlass/include -I/home/zhuyw/anaconda3/envs/qcom/include/python3.10 -c mix_cuda/cult.cu -o build/temp.linux-x86_64-cpython-310/mix_cuda/cult.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CPP -gencode arch=compute_80,code=sm_80 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=mixlib -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin g++
mix_cuda/cult.cu(2212): error: more than one user-defined conversion from "c10::Half" to "__half" applies:
function "__half::__half(float)"
/home/zhuyw/anaconda3/envs/qcom/lib/python3.10/site-packages/nvidia/cuda_runtime/include/cuda_fp16.hpp(214): here
function "__half::__half(__half &&) noexcept" (declared implicitly)
mix_cuda/cult.cu(3018): warning #2361-D: invalid narrowing conversion from "long" to "unsigned int"
Remark: The warnings can be suppressed with "-diag-suppress "
1 error detected in the compilation of "mix_cuda/cult.cu".
error: command '/usr/bin/nvcc' failed with exit code 2
my visual envrionment as follow
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
bzip2 1.0.8 h5eee18b_6
ca-certificates 2025.2.25 h06a4308_0
eetq 1.0.0 pypi_0 pypi
filelock 3.13.1 pypi_0 pypi
fsspec 2024.6.1 pypi_0 pypi
jinja2 3.1.4 pypi_0 pypi
ld_impl_linux-64 2.40 h12ee557_0
libffi 3.4.4 h6a678d5_1
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
libuuid 1.41.5 h5eee18b_0
markupsafe 2.1.5 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0
networkx 3.3 pypi_0 pypi
ninja 1.11.1.4 pypi_0 pypi
numpy 2.1.2 pypi_0 pypi
nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi
nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi
nvidia-curand-cu12 10.3.2.106 pypi_0 pypi
nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi
nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi
nvidia-nccl-cu12 2.21.5 pypi_0 pypi
nvidia-nvjitlink-cu12 12.1.105 pypi_0 pypi
nvidia-nvtx-cu12 12.1.105 pypi_0 pypi
openssl 3.0.16 h5eee18b_0
packaging 25.0 pypi_0 pypi
pillow 11.0.0 pypi_0 pypi
pip 25.0 py310h06a4308_0
python 3.10.16 he870216_1
readline 8.2 h5eee18b_0
setuptools 75.8.0 py310h06a4308_0
sqlite 3.45.3 h5eee18b_0
sympy 1.13.1 pypi_0 pypi
tk 8.6.14 h39e8969_0
torch 2.5.1+cu121 pypi_0 pypi
torchaudio 2.5.1+cu121 pypi_0 pypi
torchvision 0.20.1+cu121 pypi_0 pypi
triton 3.1.0 pypi_0 pypi
typing-extensions 4.12.2 pypi_0 pypi
tzdata 2025a h04d1e81_0
wheel 0.45.1 py310h06a4308_0
xz 5.6.4 h5eee18b_1
zlib 1.2.13 h5eee18b_1