Problem
If the metacommand .L is used to load source code from a file, the device compiler returns an error. But the host works fine.
$ cat getCudaVersion.cpp
int getCudaVersion(){
int version = 0;
cudaRuntimeGetVersion(&version);
return version;
}
$ cling -xcuda
****************** CLING ******************
* Type C++ code and press enter to run it *
* Type .q to exit *
*******************************************
[cling]$ .L getCudaVersion.cpp
[cling]$ getCudaVersion()
input_line_3:2:2: error: use of undeclared identifier 'getCudaVersion'
getCudaVersion()
^
IncrementalCUDADeviceCompiler::process()
failed at compile ptx code
(int) 8000
[cling]$
Reason
The coupling of the device clang::interpreter within the host clang::interpreter has some "gaps". Some functions, e.g. loadFile(), have no handling for the device cling::interpreter.
First solution
Modify following code lib/Interpreter/Interpreter.cpp :
Interpreter::loadFile(const std::string& filename,
bool allowSharedLib /*=true*/,
Transaction** T /*= 0*/) {
if (allowSharedLib) {
CompilationResult result = loadLibrary(filename, true);
if (result!=kMoreInputExpected)
...
to
Interpreter::loadFile(const std::string& filename,
bool allowSharedLib /*=true*/,
Transaction** T /*= 0*/) {
if (m_Opts.CompilerOpts.CUDAHost)
m_CUDACompiler->getPTXInterpreter()->loadFile(filename, allowSharedLib);
if (allowSharedLib) {
CompilationResult result = loadLibrary(filename, true);
if (result!=kMoreInputExpected)
...
The fix works well with the example getCudaVersion(). But the following example doesn't work.
$ cat gKernel.cu
__global__ void g1(){
int i = 3;
}
$ cling -xcuda
****************** CLING ******************
* Type C++ code and press enter to run it *
* Type .q to exit *
*******************************************
[cling]$ .L gKernel.cu
[cling]$ g1<<<1,1>>>();
[cling]$ cudaGetLastError()
(cudaError_t) (cudaError::cudaErrorInvalidDeviceFunction) : (unsigned int) 8
The new problem is that cling::Interpreter::loadFile() calls cling::Interpreter::loadHeader(), that calls cling::Interpreter::DeclareInternal(), that calls cling::IncrementalParser::Compile(). But the device cling::Interpreter does not have a working backend. The backend is provided by the IncrementalCUDADdeviceCompiler class. Currently, the IncrementalCUDADeviceCompiler class reimplements parts of the cling::Interpreter API to couple the cling::Interpreter frontend to the nvptx backend. Therefore, this function must be called instead of the functions of the device cling::Interpreter. The function declareInternal() is currently missing.
Possible solution
add declareInternal() to IncrementalCUDADeviceCompiler
It's a quick solution, but it requires an ugly if statement in cling::Interpreter::declareInternal() which checks whether the current interpreter instance is the host or the device. Another problem is that declareInternal() is not the only function that has this kind of problem. Many functions will follow.
Integrate the nvptx backend in the cling::Interpreter class
Avoid some ungly code. Is a more stable and generic solution. Needs more time for implementation. Probably needs to modify the transaction system. But the modification is also necessary for some other features.
-> Better solution -> will be implemented
Test case for clang-test
kernel.cpp
__global__ void g1(int * out){
*out = 42;
}
CUDALoadFile.C
//------------------------------------------------------------------------------
// CLING - the C++ LLVM-based InterpreterG :)
//
// This file is dual-licensed: you can choose to license it under the University
// of Illinois Open Source License or the GNU Lesser General Public License. See
// LICENSE.TXT for details.
//------------------------------------------------------------------------------
// The test checks whether setting a new include path at runtime also works for
// the PTX compiler.
// RUN: cat %s | %cling -DTEST_PATH="\"%/p/\"" -x cuda -Xclang -verify 2>&1 | FileCheck %s
// REQUIRES: cuda-runtime
#include <iostream>
// Check if cuda driver is available
int version;
cudaDriverGetVersion(&version)
// CHECK: (cudaError_t) (cudaError::cudaSuccess) : (unsigned int) 0
// Check if a CUDA compatible device (GPU) is available.
int device_count = 0;
cudaGetDeviceCount(&device_count)
// CHECK: (cudaError_t) (cudaError::cudaSuccess) : (unsigned int) 0
device_count > 0
// CHECK: (bool) true
// load kernel from file
.L include/kernel.cpp
int host = -1;
int *device;
// allocate memory on device
cudaMalloc((void**)&device, sizeof(int))
// CHECK: (cudaError_t) (cudaError::cudaSuccess) : (unsigned int) 0
// run kernel and check execution result
g1<<<1,1>>>(device);
cudaGetLastError()
// CHECK: (cudaError_t) (cudaError::cudaSuccess) : (unsigned int) 0
// copy result to host and check it
cudaMemcpy(&host, device, sizeof(int), cudaMemcpyDeviceToHost)
// CHECK: (cudaError_t) (cudaError::cudaSuccess) : (unsigned int) 0
host
// CHECK: (int) 42
// expected-no-diagnostics
.q
Problem
If the metacommand
.Lis used to load source code from a file, the device compiler returns an error. But the host works fine.Reason
The coupling of the device
clang::interpreterwithin the hostclang::interpreterhas some "gaps". Some functions, e.g.loadFile(), have no handling for the devicecling::interpreter.First solution
Modify following code lib/Interpreter/Interpreter.cpp :
to
The fix works well with the example getCudaVersion(). But the following example doesn't work.
The new problem is that
cling::Interpreter::loadFile()callscling::Interpreter::loadHeader(), that callscling::Interpreter::DeclareInternal(), that callscling::IncrementalParser::Compile(). But the devicecling::Interpreterdoes not have a working backend. The backend is provided by theIncrementalCUDADdeviceCompilerclass. Currently, theIncrementalCUDADeviceCompilerclass reimplements parts of thecling::InterpreterAPI to couple thecling::Interpreterfrontend to the nvptx backend. Therefore, this function must be called instead of the functions of the devicecling::Interpreter. The functiondeclareInternal()is currently missing.Possible solution
add
declareInternal()toIncrementalCUDADeviceCompilerIt's a quick solution, but it requires an ugly if statement in
cling::Interpreter::declareInternal()which checks whether the current interpreter instance is the host or the device. Another problem is that declareInternal() is not the only function that has this kind of problem. Many functions will follow.Integrate the nvptx backend in the
cling::InterpreterclassAvoid some ungly code. Is a more stable and generic solution. Needs more time for implementation. Probably needs to modify the transaction system. But the modification is also necessary for some other features.
-> Better solution -> will be implemented
Test case for
clang-testkernel.cpp
CUDALoadFile.C