How to define intermediate result when using schedules? #503
Unanswered
hulihan-start
asked this question in
Q&A
Replies: 1 comment 7 replies
-
|
You have too many CUDA threads per CUDA blocks. In this code, you mapped both As in the test_gpu_conv2d.py example, a typical way to deal with it is to tile the loops with where you control the |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I followed your test code on: https://github.com/roastduck/FreeTensor/blob/master/test/70.program/test_gpu_conv2d.py
I'm not sure if 'cache' is the correct keyword for this case, but a CUDA error was found:
ptxas warning : Value of threads per SM for entry kernel0 is out of range. .minnctapersm and .maxntid will be ignored
CUDA error in file '/root/.freetensor/o17vag/run.cu' in line 73 : invalid argument.
Traceback (most recent call last):
File "/data/not_backed_up/lihhu/FreeTensor_experiments/TransR_scheduler.py", line 87, in
transr()
File "/data/not_backed_up/lihhu/FreeTensor_experiments/TransR_scheduler.py", line 84, in transr
result = eval(func, True, True)
File "/data/not_backed_up/lihhu/FreeTensor_experiments/TransR_scheduler.py", line 50, in eval
t1, _ = driver.time()
RuntimeError: cuda error
Here is my code:
Can you help me to fix this issue? Thank you so much!
Beta Was this translation helpful? Give feedback.
All reactions