-
Notifications
You must be signed in to change notification settings - Fork 429
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Required prerequisites
- I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
Motivation
TileLang code:
import tilelang
import tilelang.language as T
@tilelang.jit
def get_qwq():
@T.prim_func
def main(
A: T.Tensor[(2, 2560), T.float32],
B: T.Tensor[(2, 2560), T.float32],
C: T.Tensor[(2, ), T.float32]
):
with T.Kernel(1, threads=256):
tx = T.get_thread_binding(0)
C_local = T.alloc_fragment((2, ), T.float32)
T.copy(C, C_local)
for i, j in T.Parallel(2, 2560):
if C_local[i] >= 0:
B[i, j] = A[i, j]
return main
kernel = get_qwq()
print(kernel.get_kernel_source())generated code:
#include <tl_templates/cuda/gemm.h>
#include <tl_templates/cuda/copy.h>
#include <tl_templates/cuda/reduce.h>
#include <tl_templates/cuda/ldsm.h>
#include <tl_templates/cuda/threadblock_swizzle.h>
#include <tl_templates/cuda/debug.h>
#ifdef ENABLE_BF16
#include <tl_templates/cuda/cuda_bf16_fallbacks.cuh>
#endif
extern "C" __global__ void main_kernel(const float* __restrict__ A, float* __restrict__ B, const float* __restrict__ C);
extern "C" __global__ void __launch_bounds__(256, 1) main_kernel(const float* __restrict__ A, float* __restrict__ B, const float* __restrict__ C) {
float C_local[1];
C_local[0] = C[(((int)threadIdx.x) & 1)];
if ((((int)threadIdx.x) >> 1) == 0) {
#pragma unroll
for (int i = 0; i < 640; ++i) {
float4 v_ = *(float4*)(A + (((((int)threadIdx.x) & 1) * 2560) + (i * 4)));
tl::store_global_128_conditional((&(B[(((((int)threadIdx.x) & 1) * 2560) + (i * 4))])), (*(uint4 *)(&(v_))), (0x0p+0f/*0.000000e+00*/ <= C_local[0]));
}
}
}In the generated kernel, only threads 0 and 1 perform the global memory read/write, leaving most threads idle. This indicates a suboptimal layout and thread mapping, and there should be room to improve the layout.
Solution
No response
Alternatives
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request