Add generate_identity_sequences helper and replace lambdas with named functors#3628
Closed
tenpercent wants to merge 4 commits intodevelopfrom
Closed
Add generate_identity_sequences helper and replace lambdas with named functors#3628tenpercent wants to merge 4 commits intodevelopfrom
tenpercent wants to merge 4 commits intodevelopfrom
Conversation
This was referenced Jan 22, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce C++ template instantiations (and improve build times) by introducing reusable helpers for common sequence/tuple metaprogramming patterns and by replacing per-call-site lambdas with named functors.
Changes:
- Added
generate_identity_sequences<N>()helper to generateTuple<Sequence<0>, ..., Sequence<N-1>>without lambdas. - Added named sequence utilities (
merge_sequences_functor,unpack_and_merge_sequences) and replaced lambdas intransform_tensor_descriptor/TensorDescriptorlogic. - Updated multiple call sites to use the new helper(s) and added unit tests.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/util/unit_sequence_helper.cpp | Adds unit tests for generate_identity_sequences and unpack_and_merge_sequences. |
| test/util/CMakeLists.txt | Adds a new gtest executable target for the new unit tests. |
| include/ck/wrapper/utils/tensor_partition.hpp | Switches identity-dimension tuple generation to generate_identity_sequences. |
| include/ck/wrapper/utils/layout_utils.hpp | Switches identity-dimension tuple generation to generate_identity_sequences. |
| include/ck/wrapper/tensor.hpp | Switches identity-dimension tuple generation to generate_identity_sequences. |
| include/ck/wrapper/operations/gemm.hpp | Switches identity-dimension tuple generation to generate_identity_sequences. |
| include/ck/wrapper/layout.hpp | Switches identity-dimension tuple generation to generate_identity_sequences. |
| include/ck/utility/tuple_helper.hpp | Introduces generate_identity_sequences helper implementation. |
| include/ck/utility/sequence_helper.hpp | Introduces named functors and unpack_and_merge_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r3_scatter.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r3.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r2.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r2.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1_gather.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1_dequant.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/device/matrix_padder.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_description/tensor_descriptor.hpp | Replaces lambdas with named functors and uses unpack_and_merge_sequences. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
shumway
reviewed
Jan 23, 2026
Replace inline lambdas with named functor structs in transform_tensor_descriptor to reduce template instantiation overhead and improve compile times. Changes: - Add three named functors in tensor_descriptor.hpp: - convert_visible_to_hidden_id: maps visible dimension ID to hidden ID - convert_visible_ids_to_hidden_ids: maps sequence of visible IDs to hidden IDs - generate_arithmetic_sequence_from_scan: generates consecutive hidden dim ID ranges - Add utility functions in sequence_helper.hpp and tuple_helper.hpp: - unpack_and_merge_sequences(): unpacks tuple of sequences and merges them - generate_identity_sequences(): creates Tuple<Sequence<0>, Sequence<1>, ...> - Update 14 call sites across threadwise transfer, wrapper, and device files to use generate_identity_sequences() instead of generate_tuple with lambdas - Add comprehensive unit tests: - unit_sequence_helper.cpp: tests for new utility functions - unit_tensor_descriptor_functors.cpp: tests for new functors Co-Authored-By: Claude <noreply@anthropic.com>
95c6c4b to
bce6ec1
Compare
tenpercent
commented
Jan 29, 2026
| } | ||
|
|
||
| // Functor wrapper for merge_sequences to enable reuse across call sites | ||
| struct merge_sequences_functor |
Contributor
Author
There was a problem hiding this comment.
one thing to consider is whether those new helper functors are the implementation detail and not the part of the header interface
shumway
approved these changes
Feb 3, 2026
2 tasks
Contributor
|
Imported to ROCm/rocm-libraries |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
generate_identity_sequences<N>()helper that returnsTuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>transform_tensor_descriptorunpack_and_merge_sequenceshelper functortransform_tensor_descriptorinstantiations from 388 to 32 (92% reduction)Motivation
Multiple call sites use
generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{})pattern. A named helper reduces lambda instantiations.Additionally, each lambda in
transform_tensor_descriptorcreates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation.Changes
Part 1: generate_identity_sequences helper
Part 2: Named functors in transform_tensor_descriptor
unpack_and_merge_sequenceshelper to replace lambda inGetNumOfHiddenDimensiongenerate_identity_sequencesinmatrix_padder.hppTest Plan
generate_identity_sequencesunpack_and_merge_sequencesRelated PRs
This PR merges the functionality from:
Part of PR stack for issue #3575 (Reduce CK/CKTile Build Times)
Note: This PR supersedes #3588 and #3589, which can be closed once this is merged.