-
Notifications
You must be signed in to change notification settings - Fork 429
[Feature] Add Loop Invariant Code Motion (LICM) Pass #1771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Introduced new configuration options for disabling loop unswitching and enabling LICM. - Implemented the LICM pass to hoist loop-invariant LetStmt nodes out of loops, reducing redundant computations. - Updated pass configuration to include LICM parameters for fine-tuning optimization behavior. - Enhanced the device code generation process to incorporate LICM, improving overall performance. This addition enhances the optimization capabilities of the framework, allowing for more efficient code generation.
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughAdds a configurable Loop Invariant Code Motion (LICM) pass with CSE to the TileLang/TIR pipeline, new buffer-analysis utilities, Python bindings and pass config options, tests, and integration into the device codegen flow; also updates a TVM submodule pointer. Changes
Sequence Diagram(s)sequenceDiagram
participant Lower as Lowering
participant Hoist as HoistBroadcastValues
participant LICM as LoopInvariantCodeMotion
participant Backend as BackendCodegen
Lower->>Hoist: apply HoistBroadcastValues
Hoist-->>Lower: transformed IRModule
Lower->>LICM: apply LoopInvariantCodeMotion (if enabled)
LICM-->>Lower: hoisted LetStmts & extracted CSE temporaries
Lower->>Backend: call backend codegen (cuda/hip/...)
Backend-->>Lower: generated device code
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…licm-optimization
Follow the naming convention of other pass configs in the codebase. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/transform/loop_invariant_code_motion.cc`:
- Around line 538-553: The current hoisting loop uses checker.IsInvariant(expr)
and only loop var/written-buffers checks, so expressions that reference
loop-local LetStmt vars can be incorrectly treated as invariant; modify the
extraction criteria in the loop-over expr_counts (where expr and count are used,
and complexity is computed via ExprComplexityCalculator::Calculate) to also
detect any dependency on loop-local LetStmt variables and exclude such
expressions unless those LetStmt vars themselves are marked hoisted;
specifically, enhance checker.IsInvariant or add a predicate that walks expr to
find LetStmt-created local symbols and rejects extraction if any of those locals
are loop-local and not in the hoisted-set (i.e., ensure
cse_eligible/licm_eligible are only true when there are no non-hoisted
loop-local LetStmt dependencies).
In `@testing/python/transform/test_tilelang_transform_licm.py`:
- Around line 476-478: In the __main__ block remove the leftover debug call to
test_print_result() and restore the proper test runner invocation: uncomment or
re-enable tilelang.testing.main() and ensure test_print_result() is not called
directly from main; update the block so it only calls tilelang.testing.main()
when __name__ == "__main__" to allow CI to run tests normally.
🧹 Nitpick comments (9)
src/transform/common/buffer_analysis.h (3)
16-16: Avoidusing namespacein header files.
using namespace tir;in a header file pollutes the namespace for all translation units that include this header, which can lead to name collisions and unexpected behavior.♻️ Proposed fix: Use explicit namespace qualification
-using namespace tir;Then update the class definitions to use explicit
tir::prefixes, for example:
tir::StmtExprVisitorinstead ofStmtExprVisitortir::BufferStoreNodeinstead ofBufferStoreNodetir::CallNodeinstead ofCallNode- etc.
26-27: Unnecessarystd::moveon return value.Using
std::moveon a local variable being returned can prevent Named Return Value Optimization (NRVO). The compiler can optimize this better without the explicit move.♻️ Proposed fix
static std::unordered_set<const VarNode *> Collect(const Stmt &stmt) { WrittenBufferCollector collector; collector(stmt); - return std::move(collector.written_buffers); + return collector.written_buffers; }
44-48: Conservative treatment ofaddress_ofas write.The
address_ofhandling assumes any buffer whose address is taken is written. While this is safe, it may be overly conservative in cases where the address is only used for reading. Consider adding a comment documenting this intentional conservative behavior.📝 Suggested documentation
} else if (op->op.same_as(builtin::address_of())) { + // Conservative: treat address_of as potential write since we can't + // determine how the address will be used downstream if (const auto *load = op->args[0].as<BufferLoadNode>()) { written_buffers.insert(load->buffer->data.get()); } }testing/python/transform/test_tilelang_transform_licm.py (5)
61-64: Unused helper function.
_count_expr_occurrencesis defined but never used in any test. Consider removing it or adding tests that utilize it.
88-89: Remove extraneousfprefixes from strings without placeholders.These assertion messages don't contain any format placeholders, so the
fprefix is unnecessary.♻️ Proposed fix
- assert "x" in outside, f"x should be outside loop" - assert "x" not in inside, f"x should not be inside loop" + assert "x" in outside, "x should be outside loop" + assert "x" not in inside, "x should not be inside loop"This pattern applies to many other assertions in this file (lines 110-113, 131-132, 153-156, 175, 322, 325, 365-366, 438-439, 446).
173-175: Unused unpacked variable.The
outsidevariable is unpacked but never used. Use_to indicate intentionally unused variables.♻️ Proposed fix
- outside, inside = _find_lets_in_stmt(_get_body(result)) + _, inside = _find_lets_in_stmt(_get_body(result))This pattern applies to several other tests where
insideis unused (lines 199, 223, 246, 268, 290, 319, 445).
185-204: Unused function parameterC.The
Ctensor parameter is declared but never used in this test function.♻️ Proposed fix
`@T.prim_func` def before( A: T.Tensor((128,), T.float32), B: T.Tensor((128,), T.float32), - C: T.Tensor((128,), T.float32), base: T.int32, offset: T.int32, ):
246-251: Unused local variablecse_vars.The variable
cse_varsis assigned but never used. The assertion only checks thatresult is not None, which doesn't verify the expected behavior.♻️ Proposed fix: Either use cse_vars or remove it
Option 1 - Remove the unused variable:
result = _apply_licm(before) - outside, inside = _find_lets_in_stmt(_get_body(result)) - - # Should NOT have extracted any CSE variable (each expr appears once) - cse_vars = [v for v in outside if v.startswith("cse_var")] - # This is a weaker test - we just check it doesn't crash assert result is not NoneOption 2 - Add a meaningful assertion:
result = _apply_licm(before) - outside, inside = _find_lets_in_stmt(_get_body(result)) + outside, _ = _find_lets_in_stmt(_get_body(result)) # Should NOT have extracted any CSE variable (each expr appears once) cse_vars = [v for v in outside if v.startswith("cse_var")] - # This is a weaker test - we just check it doesn't crash - assert result is not None + # With default min_occurrences_for_cse=2, single occurrence exprs won't be extracted + # (unless they meet the complexity threshold for LICM mode) + assert len(cse_vars) <= 1, f"Should have at most 1 CSE var for single-occurrence exprs, got {outside}"src/transform/loop_invariant_code_motion.cc (1)
562-567: Make candidate ordering deterministic when complexities tie.
expr_countscomes fromstd::unordered_map, and sorting only by complexity leaves tie ordering dependent on hash iteration, which can makecse_var_*assignment nondeterministic. Consider adding a stable tie‑breaker (e.g.,StructuralHash).♻️ Suggested tie-breaker for deterministic ordering
- std::stable_sort(candidates.begin(), candidates.end(), - [](const auto &a, const auto &b) { - return ExprComplexityCalculator::Calculate(a.first) > - ExprComplexityCalculator::Calculate(b.first); - }); + std::stable_sort(candidates.begin(), candidates.end(), + [](const auto &a, const auto &b) { + size_t ca = + ExprComplexityCalculator::Calculate(a.first); + size_t cb = + ExprComplexityCalculator::Calculate(b.first); + if (ca != cb) return ca > cb; + return StructuralHash()(a.first) < + StructuralHash()(b.first); + });
| for (const auto &[expr, count] : expr_counts) { | ||
| if (!checker.IsInvariant(expr)) { | ||
| continue; | ||
| } | ||
| size_t complexity = ExprComplexityCalculator::Calculate(expr); | ||
|
|
||
| // CSE criterion: appears multiple times | ||
| bool cse_eligible = | ||
| (static_cast<int>(count) >= config_->min_occurrences_for_cse && | ||
| static_cast<int>(complexity) >= config_->min_complexity_for_cse); | ||
|
|
||
| // LICM criterion: complex enough to warrant hoisting even if appears once | ||
| bool licm_eligible = | ||
| (static_cast<int>(complexity) >= config_->min_complexity_for_licm); | ||
|
|
||
| if (cse_eligible || licm_eligible) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent hoisting expressions that depend on loop‑local LetStmt vars.
Phase 2 only checks the loop var + written buffers, so expressions that depend on non‑hoisted loop‑local let vars can be treated as invariant and hoisted outside the loop. That can introduce out‑of‑scope references or wrong results (e.g., let x = i; ... x + 1 ...). Please exclude non‑hoisted loop‑local vars from extraction.
🛠️ Proposed fix: track loop-local variant vars and block extraction
- auto [extracted_lets, final_body] = ExtractInvariantSubexpressions(
- new_body, op->loop_var, written_buffers, invariance_checker);
+ std::unordered_set<const VarNode *> variant_vars;
+ variant_vars.reserve(let_stmts.size());
+ std::unordered_set<const VarNode *> hoisted_vars;
+ hoisted_vars.reserve(hoisted_lets.size());
+ for (const auto *let : hoisted_lets) {
+ hoisted_vars.insert(let->var.get());
+ }
+ for (const auto *let : let_stmts) {
+ if (!hoisted_vars.count(let->var.get())) {
+ variant_vars.insert(let->var.get());
+ }
+ }
+ auto [extracted_lets, final_body] = ExtractInvariantSubexpressions(
+ new_body, op->loop_var, written_buffers, invariance_checker,
+ variant_vars);- ExtractInvariantSubexpressions(
- const Stmt &body, const Var &loop_var,
- const std::unordered_set<const VarNode *> &written_buffers,
- const LoopInvarianceChecker &checker) {
+ ExtractInvariantSubexpressions(
+ const Stmt &body, const Var &loop_var,
+ const std::unordered_set<const VarNode *> &written_buffers,
+ const LoopInvarianceChecker &checker,
+ const std::unordered_set<const VarNode *> &variant_vars) {
...
- if (!checker.IsInvariant(expr)) {
+ if (UsesVar(expr, [&variant_vars](const VarNode *v) {
+ return variant_vars.count(v);
+ })) {
+ continue;
+ }
+ if (!checker.IsInvariant(expr)) {
continue;
}🤖 Prompt for AI Agents
In `@src/transform/loop_invariant_code_motion.cc` around lines 538 - 553, The
current hoisting loop uses checker.IsInvariant(expr) and only loop
var/written-buffers checks, so expressions that reference loop-local LetStmt
vars can be incorrectly treated as invariant; modify the extraction criteria in
the loop-over expr_counts (where expr and count are used, and complexity is
computed via ExprComplexityCalculator::Calculate) to also detect any dependency
on loop-local LetStmt variables and exclude such expressions unless those
LetStmt vars themselves are marked hoisted; specifically, enhance
checker.IsInvariant or add a predicate that walks expr to find LetStmt-created
local symbols and rejects extraction if any of those locals are loop-local and
not in the hoisted-set (i.e., ensure cse_eligible/licm_eligible are only true
when there are no non-hoisted loop-local LetStmt dependencies).
| if __name__ == "__main__": | ||
| # tilelang.testing.main() | ||
| test_print_result() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug code left in main block.
The test runner tilelang.testing.main() is commented out and test_print_result() is called directly instead. This should be corrected for proper test execution in CI.
🐛 Proposed fix
if __name__ == "__main__":
- # tilelang.testing.main()
- test_print_result()
+ tilelang.testing.main()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if __name__ == "__main__": | |
| # tilelang.testing.main() | |
| test_print_result() | |
| if __name__ == "__main__": | |
| tilelang.testing.main() |
🤖 Prompt for AI Agents
In `@testing/python/transform/test_tilelang_transform_licm.py` around lines 476 -
478, In the __main__ block remove the leftover debug call to test_print_result()
and restore the proper test runner invocation: uncomment or re-enable
tilelang.testing.main() and ensure test_print_result() is not called directly
from main; update the block so it only calls tilelang.testing.main() when
__name__ == "__main__" to allow CI to run tests normally.
… test_no_extract_single_occurrence - Simplified the condition check for BufferStore and Evaluate nodes using a tuple. - Added a print statement to output CSE variables in the test case for better debugging.
Summary
This PR adds a Loop Invariant Code Motion (LICM) optimization pass that performs two optimizations:
Example Transformation
Features
PassContext:min_occurrences_for_cse: Minimum occurrences for CSE extraction (default: 2)min_complexity_for_cse: Minimum complexity for CSE extraction (default: 2)min_complexity_for_licm: Minimum complexity for single-occurrence extraction (default: 3)tl.enable_licmconfig (default: disabled)Files Changed
src/transform/loop_invariant_code_motion.cc- Main LICM pass implementationsrc/transform/common/buffer_analysis.h- Reusable buffer analysis utilitiessrc/op/builtin.h/src/op/builtin.cc- Config registrationtilelang/transform/__init__.py- Python bindingtilelang/transform/pass_config.py- PassConfigKey entriestesting/python/transform/test_tilelang_transform_licm.py- Unit tests (17 tests)Test Plan
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Integration
Tests
Chores