Conversation
2aa7447 to
d442948
Compare
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
This comment has been minimized.
This comment has been minimized.
83c6791 to
535da7d
Compare
This comment has been minimized.
This comment has been minimized.
535da7d to
7a15fdf
Compare
|
I removed the helper underlying implementation function for fixed segment size overloads as it pre required knowledge of the |
|
adding missing unit tests just now |
4d49de7 to
44d9d80
Compare
44d9d80 to
f4048ec
Compare
This comment has been minimized.
This comment has been minimized.
miscco
left a comment
There was a problem hiding this comment.
Looks good.
@bernhardmgruber I observe that we are really loose with the naming conventions We have InitValueT, init_value_t, init_t, no alias at all Same for AccumT and so on
We really should be more consistent
| d_out, | ||
| num_segments, | ||
| segment_size, | ||
| ::cuda::std::plus{}, |
There was a problem hiding this comment.
Question: This uses plus<void> and we have observed performance issues with this, because for smaller integer types it promotes. Shuld this rather be
| ::cuda::std::plus{}, | |
| ::cuda::std::plus<detail::it_value_t<InputIteratorT>>{}, |
There was a problem hiding this comment.
Such changes should definitely go to separate PRs, since they change the status quo. AFAIK @gonidelis copies the setup for the dispatch call from the other non-env overloads.
There was a problem hiding this comment.
true ☝🏼 why do they change status quo?
There was a problem hiding this comment.
For integer types plus<> introduces integer promotion, which e.g plus<short> does not.
So depending on the tested types, this can actually have some considerable performance implications
| using OffsetT = detail::common_iterator_value_t<BeginOffsetIteratorT, EndOffsetIteratorT>; | ||
| using InputT = detail::it_value_t<InputIteratorT>; | ||
| using init_t = InputT; | ||
| using op_t = ::cuda::minimum<>; |
There was a problem hiding this comment.
Ditto: Should this rather be
| using op_t = ::cuda::minimum<>; | |
| using op_t = ::cuda::minimum<InputT>; |
| d_out, | ||
| num_segments, | ||
| segment_size, | ||
| ::cuda::minimum<>{}, |
There was a problem hiding this comment.
Ditto: explicit
| ::cuda::minimum<>{}, | |
| ::cuda::minimum<input_t>{}, |
There was a problem hiding this comment.
resolving these per bernhard's suggestion and will handle in a separate PR
…r to common impl
- Add private segmented_reduce_impl that centralizes determinism
validation (static_assert rejecting gpu_to_gpu), dispatch_with_env,
and tuning extraction, eliminating boilerplate across all env overloads
- Refactor Reduce, Sum, Min, Max env overloads to delegate to
segmented_reduce_impl
- Add new env overloads for ArgMin and ArgMax with full documentation
including literalinclude snippet tags
- Rewrite env_api tests covering all 6 APIs (Reduce, Sum, Min, Max,
ArgMin, ArgMax) with determinism and stream_ref acceptance tests
- Unify _env.cu and _env_launch.cu into a single _env.cu test file
with default env, launch wrapper, custom stream, and tuning tests
…ed extra redundant logic for no reason
f4048ec to
e62f191
Compare
😬 CI Workflow Results🟥 Finished in 1h 22m: Pass: 14%/249 | Total: 3d 07h | Max: 1h 04m | Hits: 88%/44193See results here. |
| REQUIRE(error == cudaSuccess); | ||
| } | ||
|
|
||
| C2H_TEST("cub::DeviceSegmentedReduce::Reduce env-based API", "[segmented_reduce][env]") |
There was a problem hiding this comment.
where is env used in the env API tests ? If the focus here is just to show single-phase API with default env ?
We use stream or memory resources in other algorithm env API tests to show the usage. Do we want to do same here as-well ?
Adds env based overloads for all
DeviceSegmentedReduce::*algorithmscloses #7550
Segmented Reduce is inherently
run_to_rundeterministic thus this is the largest deterministic guarantee allowed. If you believe there at some point can be an a perf optimization that will ruin this contract let me know and we will remove this promise in this PR. Otherwise we stay bound to that.