Add env SegmentedReduce by gonidelis · Pull Request #7795 · NVIDIA/cccl

gonidelis · 2026-02-25T20:05:51Z

Adds env based overloads for all DeviceSegmentedReduce::* algorithms

Segmented Reduce is inherently run_to_run deterministic thus this is the largest deterministic guarantee allowed. If you believe there at some point can be an a perf optimization that will ruin this contract let me know and we will remove this promise in this PR. Otherwise we stay bound to that.

copy-pr-bot · 2026-02-25T20:05:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot · 2026-02-25T20:29:57Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cub/cub/device/device_segmented_reduce.cuh

gonidelis · 2026-03-09T22:00:34Z

I removed the helper underlying implementation function for fixed segment size overloads as it pre required knowledge of the AccumT and added extra logic that was unnecessary. Non fixed-size overloads still do use the *_impl function though

cub/cub/device/device_segmented_reduce.cuh

cub/test/catch2_test_device_segmented_reduce_env.cu

gonidelis · 2026-03-09T22:30:12Z

adding missing unit tests just now

miscco

Looks good.

@bernhardmgruber I observe that we are really loose with the naming conventions We have InitValueT, init_value_t, init_t, no alias at all Same for AccumT and so on

We really should be more consistent

miscco · 2026-03-10T08:21:09Z

cub/cub/device/device_segmented_reduce.cuh

+            d_out,
+            num_segments,
+            segment_size,
+            ::cuda::std::plus{},


Question: This uses plus<void> and we have observed performance issues with this, because for smaller integer types it promotes. Shuld this rather be

Suggested change

::cuda::std::plus{},

::cuda::std::plus<detail::it_value_t<InputIteratorT>>{},

Such changes should definitely go to separate PRs, since they change the status quo. AFAIK @gonidelis copies the setup for the dispatch call from the other non-env overloads.

true ☝🏼 why do they change status quo?

For integer types plus<> introduces integer promotion, which e.g plus<short> does not.

So depending on the tested types, this can actually have some considerable performance implications

miscco · 2026-03-10T08:21:44Z

cub/cub/device/device_segmented_reduce.cuh

+    using OffsetT = detail::common_iterator_value_t<BeginOffsetIteratorT, EndOffsetIteratorT>;
+    using InputT  = detail::it_value_t<InputIteratorT>;
+    using init_t  = InputT;
+    using op_t    = ::cuda::minimum<>;


Ditto: Should this rather be

Suggested change

using op_t = ::cuda::minimum<>;

using op_t = ::cuda::minimum<InputT>;

cub/cub/device/device_segmented_reduce.cuh

miscco · 2026-03-10T08:24:11Z

cub/cub/device/device_segmented_reduce.cuh

+            d_out,
+            num_segments,
+            segment_size,
+            ::cuda::minimum<>{},


Ditto: explicit

Suggested change

::cuda::minimum<>{},

::cuda::minimum<input_t>{},

resolving these per bernhard's suggestion and will handle in a separate PR

gonidelis · 2026-03-10T15:11:46Z

@miscco #7974 (comment) ok?

…r to common impl - Add private segmented_reduce_impl that centralizes determinism validation (static_assert rejecting gpu_to_gpu), dispatch_with_env, and tuning extraction, eliminating boilerplate across all env overloads - Refactor Reduce, Sum, Min, Max env overloads to delegate to segmented_reduce_impl - Add new env overloads for ArgMin and ArgMax with full documentation including literalinclude snippet tags - Rewrite env_api tests covering all 6 APIs (Reduce, Sum, Min, Max, ArgMin, ArgMax) with determinism and stream_ref acceptance tests - Unify _env.cu and _env_launch.cu into a single _env.cu test file with default env, launch wrapper, custom stream, and tuning tests

…ranteed api test

…ed extra redundant logic for no reason

…ax env overloads

github-actions · 2026-03-10T16:57:48Z

😬 CI Workflow Results

🟥 Finished in 1h 22m: Pass: 14%/249 | Total: 3d 07h | Max: 1h 04m | Hits: 88%/44193

See results here.

srinivasyadav18 · 2026-03-10T22:06:56Z

cub/test/catch2_test_device_segmented_reduce_env_api.cu

  REQUIRE(error == cudaSuccess);
 }
+
+C2H_TEST("cub::DeviceSegmentedReduce::Reduce env-based API", "[segmented_reduce][env]")


where is env used in the env API tests ? If the focus here is just to show single-phase API with default env ?

We use stream or memory resources in other algorithm env API tests to show the usage. Do we want to do same here as-well ?

github-project-automation bot added this to CCCL Feb 25, 2026

github-project-automation bot moved this to Todo in CCCL Feb 25, 2026

cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Feb 25, 2026

gonidelis force-pushed the segmented_redude_env branch from 2aa7447 to d442948 Compare February 25, 2026 20:29

gonidelis marked this pull request as ready for review February 26, 2026 01:49

gonidelis requested a review from a team as a code owner February 26, 2026 01:49

gonidelis requested a review from pauleonix February 26, 2026 01:49

cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Feb 26, 2026

This comment has been minimized.

Sign in to view

bernhardmgruber reviewed Feb 26, 2026

View reviewed changes

cub/cub/device/device_segmented_reduce.cuh Outdated Show resolved Hide resolved

gonidelis force-pushed the segmented_redude_env branch from 83c6791 to 535da7d Compare March 5, 2026 14:47

gonidelis requested a review from bernhardmgruber March 5, 2026 14:47

This comment has been minimized.

Sign in to view

gonidelis force-pushed the segmented_redude_env branch from 535da7d to 7a15fdf Compare March 9, 2026 21:57

bernhardmgruber reviewed Mar 9, 2026

View reviewed changes

cub/cub/device/device_segmented_reduce.cuh Outdated Show resolved Hide resolved

cub/cub/device/device_segmented_reduce.cuh Outdated Show resolved Hide resolved

cub/test/catch2_test_device_segmented_reduce_env.cu Outdated Show resolved Hide resolved

gonidelis force-pushed the segmented_redude_env branch from 4d49de7 to 44d9d80 Compare March 9, 2026 23:22

gonidelis requested review from NaderAlAwar and srinivasyadav18 March 9, 2026 23:22

gonidelis force-pushed the segmented_redude_env branch from 44d9d80 to f4048ec Compare March 10, 2026 00:11

This comment has been minimized.

Sign in to view

gonidelis enabled auto-merge (squash) March 10, 2026 03:18

miscco reviewed Mar 10, 2026

View reviewed changes

gonidelis mentioned this pull request Mar 10, 2026

InitValueT, init_value_t and init_t in CUB device #7974

Open

gonidelis added 2 commits March 10, 2026 08:32

Add env SegmentedReduce

21df635

gonidelis added 7 commits March 10, 2026 08:32

Add env overloads for fixed size segment APIs

b3e9f72

add env api literalinclude example just for Reduce and remove non gua…

ead5359

…ranteed api test

Remove fixed_size_segmented_reduce_impl underlying function as it add…

86b1209

…ed extra redundant logic for no reason

Add unit tests for fixed-seg-size overloads and argmin argmax

361ec5e

Fix GCC 7 auto deduction in generic lambda for fixed-size ArgMin/ArgM…

1367aa8

…ax env overloads

Use __query_result_or_t to query tuning environment

1fbc895

Static assert on numeric_limits specialization

e62f191

gonidelis force-pushed the segmented_redude_env branch from f4048ec to e62f191 Compare March 10, 2026 15:33

gonidelis requested review from bernhardmgruber and miscco March 10, 2026 15:34

srinivasyadav18 reviewed Mar 10, 2026

View reviewed changes

	::cuda::std::plus{},
	::cuda::std::plus<detail::it_value_t<InputIteratorT>>{},

	using op_t = ::cuda::minimum<>;
	using op_t = ::cuda::minimum<InputT>;

Conversation

gonidelis commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Feb 25, 2026

Uh oh!

copy-pr-bot bot commented Feb 25, 2026

Uh oh!

This comment has been minimized.

Uh oh!

This comment has been minimized.

gonidelis commented Mar 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gonidelis commented Mar 9, 2026

Uh oh!

This comment has been minimized.

miscco left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gonidelis commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

😬 CI Workflow Results

🟥 Finished in 1h 22m: Pass: 14%/249 | Total: 3d 07h | Max: 1h 04m | Hits: 88%/44193

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gonidelis commented Feb 25, 2026 •

edited

Loading