Add generate_identity_sequences helper and replace lambdas with named functors by tenpercent · Pull Request #3628 · ROCm/composable_kernel

tenpercent · 2026-01-22T00:17:42Z

Summary

Add generate_identity_sequences<N>() helper that returns Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>
Replace lambdas with named functors in transform_tensor_descriptor
Add unpack_and_merge_sequences helper functor
Reduces transform_tensor_descriptor instantiations from 388 to 32 (92% reduction)

Motivation

Multiple call sites use generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{}) pattern. A named helper reduces lambda instantiations.

Additionally, each lambda in transform_tensor_descriptor creates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation.

Changes

Part 1: generate_identity_sequences helper

Replaces common lambda pattern for generating identity sequences
Each lambda expression creates a unique closure type, causing separate template instantiations at every call site
Named helper shares a single type across all uses

Part 2: Named functors in transform_tensor_descriptor

Add unpack_and_merge_sequences helper to replace lambda in GetNumOfHiddenDimension
Use generate_identity_sequences in matrix_padder.hpp

Test Plan

Added 7 unit tests:
- 4 tests for generate_identity_sequences
- 3 tests for unpack_and_merge_sequences
Waiting for full CI

Related PRs

This PR merges the functionality from:

Add generate_identity_sequences helper for common pattern #3588 (generate_identity_sequences helper)
Replace lambdas with named functors in transform_tensor_descriptor #3589 (Named functors in transform_tensor_descriptor)

Part of PR stack for issue #3575 (Reduce CK/CKTile Build Times)

Note: This PR supersedes #3588 and #3589, which can be closed once this is merged.

Copilot

Pull request overview

This PR aims to reduce C++ template instantiations (and improve build times) by introducing reusable helpers for common sequence/tuple metaprogramming patterns and by replacing per-call-site lambdas with named functors.

Changes:

Added generate_identity_sequences<N>() helper to generate Tuple<Sequence<0>, ..., Sequence<N-1>> without lambdas.
Added named sequence utilities (merge_sequences_functor, unpack_and_merge_sequences) and replaced lambdas in transform_tensor_descriptor/TensorDescriptor logic.
Updated multiple call sites to use the new helper(s) and added unit tests.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
test/util/unit_sequence_helper.cpp	Adds unit tests for `generate_identity_sequences` and `unpack_and_merge_sequences`.
test/util/CMakeLists.txt	Adds a new gtest executable target for the new unit tests.
include/ck/wrapper/utils/tensor_partition.hpp	Switches identity-dimension tuple generation to `generate_identity_sequences`.
include/ck/wrapper/utils/layout_utils.hpp	Switches identity-dimension tuple generation to `generate_identity_sequences`.
include/ck/wrapper/tensor.hpp	Switches identity-dimension tuple generation to `generate_identity_sequences`.
include/ck/wrapper/operations/gemm.hpp	Switches identity-dimension tuple generation to `generate_identity_sequences`.
include/ck/wrapper/layout.hpp	Switches identity-dimension tuple generation to `generate_identity_sequences`.
include/ck/utility/tuple_helper.hpp	Introduces `generate_identity_sequences` helper implementation.
include/ck/utility/sequence_helper.hpp	Introduces named functors and `unpack_and_merge_sequences`.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r3_scatter.hpp	Replaces identity sequence generation lambda with `generate_identity_sequences`.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r3.hpp	Replaces identity sequence generation lambda with `generate_identity_sequences`.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r2.hpp	Replaces identity sequence generation lambda with `generate_identity_sequences`.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r2.hpp	Replaces identity sequence generation lambda with `generate_identity_sequences`.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1_gather.hpp	Replaces identity sequence generation lambda with `generate_identity_sequences`.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1_dequant.hpp	Replaces identity sequence generation lambda with `generate_identity_sequences`.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1.hpp	Replaces identity sequence generation lambda with `generate_identity_sequences`.
include/ck/tensor_operation/gpu/device/matrix_padder.hpp	Replaces identity sequence generation lambda with `generate_identity_sequences`.
include/ck/tensor_description/tensor_descriptor.hpp	Replaces lambdas with named functors and uses `unpack_and_merge_sequences`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

test/util/unit_sequence_helper.cpp

include/ck/utility/sequence_helper.hpp

test/util/unit_sequence_helper.cpp

include/ck/tensor_description/tensor_descriptor.hpp

include/ck/utility/sequence_helper.hpp

include/ck/utility/tuple_helper.hpp

Replace inline lambdas with named functor structs in transform_tensor_descriptor to reduce template instantiation overhead and improve compile times. Changes: - Add three named functors in tensor_descriptor.hpp: - convert_visible_to_hidden_id: maps visible dimension ID to hidden ID - convert_visible_ids_to_hidden_ids: maps sequence of visible IDs to hidden IDs - generate_arithmetic_sequence_from_scan: generates consecutive hidden dim ID ranges - Add utility functions in sequence_helper.hpp and tuple_helper.hpp: - unpack_and_merge_sequences(): unpacks tuple of sequences and merges them - generate_identity_sequences(): creates Tuple<Sequence<0>, Sequence<1>, ...> - Update 14 call sites across threadwise transfer, wrapper, and device files to use generate_identity_sequences() instead of generate_tuple with lambdas - Add comprehensive unit tests: - unit_sequence_helper.cpp: tests for new utility functions - unit_tensor_descriptor_functors.cpp: tests for new functors Co-Authored-By: Claude <noreply@anthropic.com>

tenpercent · 2026-01-29T21:48:39Z

include/ck/utility/sequence_helper.hpp

 }

+// Functor wrapper for merge_sequences to enable reuse across call sites
+struct merge_sequences_functor


one thing to consider is whether those new helper functors are the implementation detail and not the part of the header interface

…mization

ammallya · 2026-02-03T22:01:43Z

Imported to ROCm/rocm-libraries

tenpercent marked this pull request as ready for review January 22, 2026 03:10

tenpercent requested review from Snektron, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, vidyasagar-amd and vpietila-amd as code owners January 22, 2026 03:10

tenpercent marked this pull request as draft January 22, 2026 18:48

vidyasagar-amd requested a review from Copilot January 22, 2026 22:05

Copilot started reviewing on behalf of vidyasagar-amd January 22, 2026 22:06 View session

Copilot AI reviewed Jan 22, 2026

View reviewed changes

cgmillette self-assigned this Jan 23, 2026

shumway reviewed Jan 23, 2026

View reviewed changes

tenpercent force-pushed the tenpercent/tensor-descriptor-functor-optimization branch from 95c6c4b to bce6ec1 Compare January 29, 2026 21:28

tenpercent marked this pull request as ready for review January 29, 2026 21:45

tenpercent commented Jan 29, 2026

View reviewed changes

tenpercent added 3 commits February 2, 2026 09:15

Merge branch 'develop' into tenpercent/tensor-descriptor-functor-opti…

fa98184

…mization

Merge branch 'develop' into tenpercent/tensor-descriptor-functor-opti…

2ec116f

…mization

Merge branch 'develop' into tenpercent/tensor-descriptor-functor-opti…

c6a8d06

…mization

shumway approved these changes Feb 3, 2026

View reviewed changes

assistant-librarian bot mentioned this pull request Feb 3, 2026

Add generate_identity_sequences helper and replace lambdas with named functors ROCm/rocm-libraries#4283

Open

2 tasks

ammallya closed this Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add generate_identity_sequences helper and replace lambdas with named functors#3628

Add generate_identity_sequences helper and replace lambdas with named functors#3628
tenpercent wants to merge 4 commits intodevelopfrom
tenpercent/tensor-descriptor-functor-optimization

tenpercent commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tenpercent Jan 29, 2026

Uh oh!

ammallya commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tenpercent commented Jan 22, 2026

Summary

Motivation

Changes

Part 1: generate_identity_sequences helper

Part 2: Named functors in transform_tensor_descriptor

Test Plan

Related PRs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tenpercent Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

ammallya commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants