Skip to content

Device function pointers #2450

@leios

Description

@leios

Right, so simply put. I want the following code to work:

using CUDA

f(x) = x+1

g(x) = x*2

function call_fxs!(fxs)
    x = 1
    for i = 1:length(fxs)
        x = fxs[1](x)
        @cuprintf("%g\n",x)
    end
end

@cuda threads = 1 call_fxs!((f, g))

This is what the code looks like in CUDA C:

#include <stdio.h>
typedef double (*func)(double x);

__device__ double func1(double x)
{
return x+1.0f;
}

__device__ double func2(double x)
{
return x*2.0f;
}

__device__ func pfunc1 = func1;
__device__ func pfunc2 = func2;

__global__ void test_kernel(func* f, int n)
{
  double x = 1.0;

  for(int i=0;i<n;++i){
   x=f[i](x);
   printf("%g\n",x);
  }
}

int main(void)
{
  int N = 2;

  func* h_f;
  func* d_f;

  h_f = (func*)malloc(N*sizeof(func));

  cudaMalloc((void**)&d_f,N*sizeof(func));

  cudaMemcpyFromSymbol( &h_f[0], pfunc1, sizeof(func));
  cudaMemcpyFromSymbol( &h_f[1], pfunc2, sizeof(func));

  cudaMemcpy(d_f,h_f,N*sizeof(func),cudaMemcpyHostToDevice);

  test_kernel<<<1,1>>>(d_f,N);

  cudaFree(d_f);
  free(h_f);

  return 0;
}

[jars@node0024 ~]$ nvcc check.cu 
[jars@node0024 ~]$ ./a.out 
2
4

I've been banging my head against it for a long time (a few months before this post: leios/Fable.jl#64 (comment))

My current solution involves @generated loops on loops, which ends up generating functions that are quite large and take a significant amount of time (sometimes up to 70 s for a kernel that runs in 0.0001 s). Mentioned here: https://discourse.julialang.org/t/is-there-any-good-way-to-call-functions-from-a-set-of-functions-in-a-cuda-kernel/102051/3?u=leios

Solutions that exist in other languages:

  1. GLSL / OpenCL: The user compiles shaders / kernels at runtime, so they can be spun up in the background relatively quickly. Somehow, this is much faster than doing essentially the same thing in Julia.
  2. CUDA: Just use fx pointers bro (though I did have to do my own AST solve for certain workflows)

I have had this discussion throughout the years with @vchuravy , @jpsamaroo , and @maleadt, but never documented it because I'm apparently the only one actually hitting the issue.

To be honest, I think we are approaching something that might not be fundamentally possible with Julia, but I would like to be able to pass in arbitrary functions to a kernel without forcing recompilation of any kind.

I am not sure if it is best to put this here or in GPUCompiler.

related discussions:

  1. https://forums.developer.nvidia.com/t/consistency-of-functions-pointer/29325/6
  2. [Roadmap Feedback] Function Pointers with some limitations KhronosGroup/Vulkan-Docs#2232

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestupstreamSomebody else's problem.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions