Best practice on loop on cartesian set A1 x … x An #2272
Replies: 1 comment 1 reply
-
The neat thing about niche languages is that you can be part of deciding what is idiomatic. The code you have written looks fine to my eyes. You can optimise the computation of all those tabulate 20 (\i -> 1.1**f64.i64 (i+1))you can write scan (*) 1 (replicate 20 1.1)
This is unbalanced work, which is intrinsically difficult. And yes, Futhark will compute the maximum memory requirement and multiply that with the number of threads. (At one point we summed the memory requirements instead, but that results in far more complicated access patterns, and was in practice pretty slow.) There is a trick that you maybe can use but it requires somewhat careful (and occasionally impractical) programming.
This is also a difficult tradeoff. The Futhark compiler will not help you do anything clever (it's actually not a very smart compiler, it is just very thorough and consistent). In practice I sometimes see people do
You have to do this manually, although CUDA will also sometimes do transparent paging between GPU memory and CPU memory. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I need to loop over a cartesian set of an "unknown" (i.e. specified in a variable) number of copies (for now) of a set. Like [0, …, 9] x …(repeat N times)… x [0, ..., 9].
For now I do it this way, i.e. loop over integers from 0 to 10^N-1, and compute the "digits" of each number like:
this seems to work fairly well, and should be quite efficient, but I've a few questions as I'm new to GPU computations and futhark:
max(x). My understanding is that futhark works better if all arrays in each branch have the same size, so does it mean that the space of my computation will be quite large since each branch will have the largest possible space size?filterto remove these branches, yet adapting my number-to-digit operation to account for these holes might be quite complex… Yet if these branches still use a large array, my memory will quickly explode… So should I still use a filter and hope that this will be negligible compared to the sequential operation that runs after? Or can I somehow let futhark know that this branch is not worth exploring and let it fill this space with a more interesting thread?Beta Was this translation helpful? Give feedback.
All reactions