Skip to content

Add generate_identity_sequences helper and replace lambdas with named functors#3628

Closed
tenpercent wants to merge 4 commits intodevelopfrom
tenpercent/tensor-descriptor-functor-optimization
Closed

Add generate_identity_sequences helper and replace lambdas with named functors#3628
tenpercent wants to merge 4 commits intodevelopfrom
tenpercent/tensor-descriptor-functor-optimization

Conversation

@tenpercent
Copy link
Contributor

Summary

  • Add generate_identity_sequences<N>() helper that returns Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>
  • Replace lambdas with named functors in transform_tensor_descriptor
  • Add unpack_and_merge_sequences helper functor
  • Reduces transform_tensor_descriptor instantiations from 388 to 32 (92% reduction)

Motivation

Multiple call sites use generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{}) pattern. A named helper reduces lambda instantiations.

Additionally, each lambda in transform_tensor_descriptor creates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation.

Changes

Part 1: generate_identity_sequences helper

  • Replaces common lambda pattern for generating identity sequences
  • Each lambda expression creates a unique closure type, causing separate template instantiations at every call site
  • Named helper shares a single type across all uses

Part 2: Named functors in transform_tensor_descriptor

  • Add unpack_and_merge_sequences helper to replace lambda in GetNumOfHiddenDimension
  • Use generate_identity_sequences in matrix_padder.hpp

Test Plan

  • Added 7 unit tests:
    • 4 tests for generate_identity_sequences
    • 3 tests for unpack_and_merge_sequences
  • Waiting for full CI

Related PRs

This PR merges the functionality from:

Part of PR stack for issue #3575 (Reduce CK/CKTile Build Times)

Note: This PR supersedes #3588 and #3589, which can be closed once this is merged.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce C++ template instantiations (and improve build times) by introducing reusable helpers for common sequence/tuple metaprogramming patterns and by replacing per-call-site lambdas with named functors.

Changes:

  • Added generate_identity_sequences<N>() helper to generate Tuple<Sequence<0>, ..., Sequence<N-1>> without lambdas.
  • Added named sequence utilities (merge_sequences_functor, unpack_and_merge_sequences) and replaced lambdas in transform_tensor_descriptor/TensorDescriptor logic.
  • Updated multiple call sites to use the new helper(s) and added unit tests.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
test/util/unit_sequence_helper.cpp Adds unit tests for generate_identity_sequences and unpack_and_merge_sequences.
test/util/CMakeLists.txt Adds a new gtest executable target for the new unit tests.
include/ck/wrapper/utils/tensor_partition.hpp Switches identity-dimension tuple generation to generate_identity_sequences.
include/ck/wrapper/utils/layout_utils.hpp Switches identity-dimension tuple generation to generate_identity_sequences.
include/ck/wrapper/tensor.hpp Switches identity-dimension tuple generation to generate_identity_sequences.
include/ck/wrapper/operations/gemm.hpp Switches identity-dimension tuple generation to generate_identity_sequences.
include/ck/wrapper/layout.hpp Switches identity-dimension tuple generation to generate_identity_sequences.
include/ck/utility/tuple_helper.hpp Introduces generate_identity_sequences helper implementation.
include/ck/utility/sequence_helper.hpp Introduces named functors and unpack_and_merge_sequences.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r3_scatter.hpp Replaces identity sequence generation lambda with generate_identity_sequences.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r3.hpp Replaces identity sequence generation lambda with generate_identity_sequences.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r2.hpp Replaces identity sequence generation lambda with generate_identity_sequences.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r2.hpp Replaces identity sequence generation lambda with generate_identity_sequences.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1_gather.hpp Replaces identity sequence generation lambda with generate_identity_sequences.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1_dequant.hpp Replaces identity sequence generation lambda with generate_identity_sequences.
include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1.hpp Replaces identity sequence generation lambda with generate_identity_sequences.
include/ck/tensor_operation/gpu/device/matrix_padder.hpp Replaces identity sequence generation lambda with generate_identity_sequences.
include/ck/tensor_description/tensor_descriptor.hpp Replaces lambdas with named functors and uses unpack_and_merge_sequences.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@cgmillette cgmillette self-assigned this Jan 23, 2026
Replace inline lambdas with named functor structs in transform_tensor_descriptor
to reduce template instantiation overhead and improve compile times.

Changes:
- Add three named functors in tensor_descriptor.hpp:
  - convert_visible_to_hidden_id: maps visible dimension ID to hidden ID
  - convert_visible_ids_to_hidden_ids: maps sequence of visible IDs to hidden IDs
  - generate_arithmetic_sequence_from_scan: generates consecutive hidden dim ID ranges

- Add utility functions in sequence_helper.hpp and tuple_helper.hpp:
  - unpack_and_merge_sequences(): unpacks tuple of sequences and merges them
  - generate_identity_sequences(): creates Tuple<Sequence<0>, Sequence<1>, ...>

- Update 14 call sites across threadwise transfer, wrapper, and device files
  to use generate_identity_sequences() instead of generate_tuple with lambdas

- Add comprehensive unit tests:
  - unit_sequence_helper.cpp: tests for new utility functions
  - unit_tensor_descriptor_functors.cpp: tests for new functors

Co-Authored-By: Claude <noreply@anthropic.com>
@tenpercent tenpercent force-pushed the tenpercent/tensor-descriptor-functor-optimization branch from 95c6c4b to bce6ec1 Compare January 29, 2026 21:28
@tenpercent tenpercent marked this pull request as ready for review January 29, 2026 21:45
}

// Functor wrapper for merge_sequences to enable reuse across call sites
struct merge_sequences_functor
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing to consider is whether those new helper functors are the implementation detail and not the part of the header interface

@ammallya
Copy link
Contributor

ammallya commented Feb 3, 2026

Imported to ROCm/rocm-libraries

@ammallya ammallya closed this Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants