Skip to content

.L Metacommand does not work on the device Compiler #8

@SimeonEhrig

Description

@SimeonEhrig

Problem

If the metacommand .L is used to load source code from a file, the device compiler returns an error. But the host works fine.

$ cat getCudaVersion.cpp 
int getCudaVersion(){
  int version = 0;
  cudaRuntimeGetVersion(&version);
  return version;
}

$ cling -xcuda

****************** CLING ******************
* Type C++ code and press enter to run it *
*             Type .q to exit             *
*******************************************
[cling]$ .L getCudaVersion.cpp
[cling]$ getCudaVersion()
input_line_3:2:2: error: use of undeclared identifier 'getCudaVersion'
 getCudaVersion()
 ^
IncrementalCUDADeviceCompiler::process()
failed at compile ptx code
(int) 8000
[cling]$

Reason

The coupling of the device clang::interpreter within the host clang::interpreter has some "gaps". Some functions, e.g. loadFile(), have no handling for the device cling::interpreter.

First solution

Modify following code lib/Interpreter/Interpreter.cpp :

Interpreter::loadFile(const std::string& filename,
                         bool allowSharedLib /*=true*/,
                         Transaction** T /*= 0*/) {
     if (allowSharedLib) {
       CompilationResult result = loadLibrary(filename, true);
       if (result!=kMoreInputExpected)
...

to

Interpreter::loadFile(const std::string& filename,
                         bool allowSharedLib /*=true*/,
                         Transaction** T /*= 0*/) {
     if (m_Opts.CompilerOpts.CUDAHost)
       m_CUDACompiler->getPTXInterpreter()->loadFile(filename, allowSharedLib);
     if (allowSharedLib) {
       CompilationResult result = loadLibrary(filename, true);
       if (result!=kMoreInputExpected)
...

The fix works well with the example getCudaVersion(). But the following example doesn't work.

$ cat gKernel.cu 
__global__ void g1(){
	int i = 3;
}
$ cling -xcuda

****************** CLING ******************
* Type C++ code and press enter to run it *
*             Type .q to exit             *
*******************************************
[cling]$ .L gKernel.cu
[cling]$ g1<<<1,1>>>();
[cling]$ cudaGetLastError()
(cudaError_t) (cudaError::cudaErrorInvalidDeviceFunction) : (unsigned int) 8

The new problem is that cling::Interpreter::loadFile() calls cling::Interpreter::loadHeader(), that calls cling::Interpreter::DeclareInternal(), that calls cling::IncrementalParser::Compile(). But the device cling::Interpreter does not have a working backend. The backend is provided by the IncrementalCUDADdeviceCompiler class. Currently, the IncrementalCUDADeviceCompiler class reimplements parts of the cling::Interpreter API to couple the cling::Interpreter frontend to the nvptx backend. Therefore, this function must be called instead of the functions of the device cling::Interpreter. The function declareInternal() is currently missing.

Possible solution

add declareInternal() to IncrementalCUDADeviceCompiler

It's a quick solution, but it requires an ugly if statement in cling::Interpreter::declareInternal() which checks whether the current interpreter instance is the host or the device. Another problem is that declareInternal() is not the only function that has this kind of problem. Many functions will follow.

Integrate the nvptx backend in the cling::Interpreter class

Avoid some ungly code. Is a more stable and generic solution. Needs more time for implementation. Probably needs to modify the transaction system. But the modification is also necessary for some other features.
-> Better solution -> will be implemented

Test case for clang-test

kernel.cpp

__global__ void g1(int * out){
  *out = 42;
}

CUDALoadFile.C

//------------------------------------------------------------------------------
// CLING - the C++ LLVM-based InterpreterG :)
//
// This file is dual-licensed: you can choose to license it under the University
// of Illinois Open Source License or the GNU Lesser General Public License. See
// LICENSE.TXT for details.
//------------------------------------------------------------------------------

// The test checks whether setting a new include path at runtime also works for
// the PTX compiler.
// RUN: cat %s | %cling -DTEST_PATH="\"%/p/\"" -x cuda -Xclang -verify 2>&1 | FileCheck %s
// REQUIRES: cuda-runtime

#include <iostream>

// Check if cuda driver is available
int version;
cudaDriverGetVersion(&version)
// CHECK: (cudaError_t) (cudaError::cudaSuccess) : (unsigned int) 0

// Check if a CUDA compatible device (GPU) is available.
int device_count = 0;
cudaGetDeviceCount(&device_count)
// CHECK: (cudaError_t) (cudaError::cudaSuccess) : (unsigned int) 0
device_count > 0
// CHECK: (bool) true

// load kernel from file
.L include/kernel.cpp

int host = -1;
int *device;

// allocate memory on device
cudaMalloc((void**)&device, sizeof(int))
// CHECK: (cudaError_t) (cudaError::cudaSuccess) : (unsigned int) 0

// run kernel and check execution result
g1<<<1,1>>>(device);
cudaGetLastError()
// CHECK: (cudaError_t) (cudaError::cudaSuccess) : (unsigned int) 0

// copy result to host and check it
cudaMemcpy(&host, device, sizeof(int), cudaMemcpyDeviceToHost)
// CHECK: (cudaError_t) (cudaError::cudaSuccess) : (unsigned int) 0
host
// CHECK: (int) 42

// expected-no-diagnostics
.q

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions