Replace O(N) recursive sequence_map_inverse with O(1) pack expansion#3596
Replace O(N) recursive sequence_map_inverse with O(1) pack expansion#3596tenpercent wants to merge 4 commits intodevelopfrom
Conversation
59f0c32 to
5190578
Compare
6d792da to
f5ada17
Compare
5190578 to
887bdf2
Compare
887bdf2 to
02e42dc
Compare
f5ada17 to
9942fd6
Compare
9d67d0d to
c4d95f7
Compare
82b6016 to
602c127
Compare
c4d95f7 to
631df4f
Compare
602c127 to
1713ea7
Compare
cbaf07b to
3b8b37d
Compare
3b8b37d to
7c9cdf0
Compare
d162e26 to
f8d808e
Compare
e921e01 to
bd98bd1
Compare
There was a problem hiding this comment.
Pull request overview
This PR optimizes sequence_map_inverse by replacing O(N) recursive template instantiation with O(1) template depth using pack expansion and constexpr loops, reducing compilation overhead.
Changes:
- Replaced recursive
sequence_map_inverse_implwithConstexprArrayand constexpr loop-basedfind_inverse - Added detailed comments explaining the compilation performance benefits
- Achieved 1.6% reduction in template instantiations (126,896 fewer instantiations)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
shumway
left a comment
There was a problem hiding this comment.
I think we want to remove the O(N^2) loop and probably add memoization, too.
include/ck/utility/sequence.hpp
Outdated
| if(values[i] == target) | ||
| return i; | ||
| } | ||
| return -1; // should not reach for valid permutation |
There was a problem hiding this comment.
Can we make this a compile-time error, just to catch anything that is really broken?
The two patterns I know are to use a static_assert or to call a consteval function that throws an error. We don't want to have this logic silently fail.
cgmillette
left a comment
There was a problem hiding this comment.
Does this handle repeated indices?
bd98bd1 to
7a427d0
Compare
Co-Authored-By: Claude <noreply@anthropic.com>
|
Imported to ROCm/rocm-libraries |
Summary
Replace the O(N) recursive
sequence_map_inverseimplementation with O(1) template depth using pack expansion to reduce compile time (#3575).Approach
constexprloop infind_source_indexto locate permutation inverse indicesWhy It Works
Template recursion requires N template instantiations for N iterations, each with its own overhead. Constexpr loops execute within a single template instantiation, avoiding per-instantiation overhead.
Build Performance Impact
Template Instantiation Reduction (measured on
device_grouped_conv3d_fwd_bias_bnorm_clamp_instancetarget, 248 files):This confirms the optimization successfully reduces template instantiation overhead by eliminating recursive template patterns in favor of pack expansion.
Test Plan
SequenceMapInverse.InverseMapandSequenceMapInverse.InverseIdentityMaptests validate correctnessNotes
sequence_mergeoptimization removed from this PR (handled in Optimize sequence_gen and uniform_sequence_gen to reduce template instantiation depth #3585)is_valid_sequence_mapbefore callingsequence_map_inverse