[Feature] Add Loop Invariant Code Motion (LICM) Pass #1771

LeiWang1999 · 2026-02-02T09:53:04Z

Summary

This PR adds a Loop Invariant Code Motion (LICM) optimization pass that performs two optimizations:

LetStmt Hoisting: Moves existing loop-invariant LetStmt nodes outside the loop
Invariant Subexpression Extraction: Extracts loop-invariant subexpressions (CSE-style) and hoists them outside the loop

Example Transformation

// Before
for (i = 0; i < n; i++) {
    A[(threadIdx.x >> 2) * 32 + i] = ...
    B[(threadIdx.x >> 2) * 32 + j] = ...
}

// After
cse_var = (threadIdx.x >> 2) * 32
for (i = 0; i < n; i++) {
    A[cse_var + i] = ...
    B[cse_var + j] = ...
}

Features

Configurable thresholds via PassContext:
- min_occurrences_for_cse: Minimum occurrences for CSE extraction (default: 2)
- min_complexity_for_cse: Minimum complexity for CSE extraction (default: 2)
- min_complexity_for_licm: Minimum complexity for single-occurrence extraction (default: 3)
Enable/disable via tl.enable_licm config (default: disabled)
Handles pure built-in operations (shift_right, bitwise_and, etc.)
Bottom-up processing for nested loops

Files Changed

src/transform/loop_invariant_code_motion.cc - Main LICM pass implementation
src/transform/common/buffer_analysis.h - Reusable buffer analysis utilities
src/op/builtin.h / src/op/builtin.cc - Config registration
tilelang/transform/__init__.py - Python binding
tilelang/transform/pass_config.py - PassConfigKey entries
testing/python/transform/test_tilelang_transform_licm.py - Unit tests (17 tests)

Test Plan

All 17 unit tests pass
Build succeeds

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Loop Invariant Code Motion (LICM) optimization added to extract and hoist invariant computations from loops; configurable thresholds available and the pass can be enabled/disabled.
Integration
- LICM applied in the device lowering pipeline before backend codegen.
Tests
- Comprehensive test suite added to validate LICM and related extraction behavior.
Chores
- Pass configuration keys added for LICM; deprecated one legacy pass config removed.

- Introduced new configuration options for disabling loop unswitching and enabling LICM. - Implemented the LICM pass to hoist loop-invariant LetStmt nodes out of loops, reducing redundant computations. - Updated pass configuration to include LICM parameters for fine-tuning optimization behavior. - Enhanced the device code generation process to incorporate LICM, improving overall performance. This addition enhances the optimization capabilities of the framework, allowing for more efficient code generation.

github-actions · 2026-02-02T09:53:15Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-02-02T09:53:28Z

📝 Walkthrough

Walkthrough

Adds a configurable Loop Invariant Code Motion (LICM) pass with CSE to the TileLang/TIR pipeline, new buffer-analysis utilities, Python bindings and pass config options, tests, and integration into the device codegen flow; also updates a TVM submodule pointer.

Changes

Cohort / File(s)	Summary
Submodule Update `3rdparty/tvm`	Updated TVM submodule pointer to a newer commit; no functional changes reported.
Pass Config & Attr `src/op/builtin.h`, `src/op/builtin.cc`	Added new TL attribute key `tl.enable_loop_invariant_code_motion` and registered pass config option for enabling LICM.
Buffer Analysis Utilities `src/transform/common/buffer_analysis.h`	New header providing `WrittenBufferCollector` and `BufferReadChecker` to detect written buffers and whether expressions read them (used for LICM safety).
LICM Pass Implementation `src/transform/loop_invariant_code_motion.cc`	New LICM pass implementation: config/reflection types, expression complexity analysis, invariance checks, CSE/subexpression extraction, LetStmt hoisting, transformers and pass registration.
Tests `testing/python/transform/test_tilelang_transform_licm.py`	New comprehensive tests covering LetStmt hoisting, invariant subexpression extraction/CSE, nested/parallel loops, and configuration-driven behavior.
Codegen Integration `tilelang/engine/lower.py`	Inserted LoopInvariantCodeMotion into device codegen pipeline after HoistBroadcastValues and before backend codegen.
Python API & Pass Config `tilelang/transform/__init__.py`, `tilelang/transform/pass_config.py`	Added public `LoopInvariantCodeMotion()` binding and new pass-config keys (enable flag and LICM thresholds); removed obsolete `TIR_SIMPLIFY` key.

Sequence Diagram(s)

sequenceDiagram
    participant Lower as Lowering
    participant Hoist as HoistBroadcastValues
    participant LICM as LoopInvariantCodeMotion
    participant Backend as BackendCodegen

    Lower->>Hoist: apply HoistBroadcastValues
    Hoist-->>Lower: transformed IRModule
    Lower->>LICM: apply LoopInvariantCodeMotion (if enabled)
    LICM-->>Lower: hoisted LetStmts & extracted CSE temporaries
    Lower->>Backend: call backend codegen (cuda/hip/...)
    Backend-->>Lower: generated device code

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested reviewers

chengyupku

Poem

🐰 I nudged the loops to let bindings hop free,

Hoisted the crumbs where eyes cannot see.
With buffers checked and repeats made small,
CSE and LICM answered the call.
Hare-brained speed—now faster for all! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 39.08% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title '[Feature] Add Loop Invariant Code Motion (LICM) Pass' directly and accurately describes the main change: introduction of a new LICM optimization pass. The title is concise, specific, and clearly reflects the primary objective of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…licm-optimization

Follow the naming convention of other pass configs in the codebase. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/transform/loop_invariant_code_motion.cc`:
- Around line 538-553: The current hoisting loop uses checker.IsInvariant(expr)
and only loop var/written-buffers checks, so expressions that reference
loop-local LetStmt vars can be incorrectly treated as invariant; modify the
extraction criteria in the loop-over expr_counts (where expr and count are used,
and complexity is computed via ExprComplexityCalculator::Calculate) to also
detect any dependency on loop-local LetStmt variables and exclude such
expressions unless those LetStmt vars themselves are marked hoisted;
specifically, enhance checker.IsInvariant or add a predicate that walks expr to
find LetStmt-created local symbols and rejects extraction if any of those locals
are loop-local and not in the hoisted-set (i.e., ensure
cse_eligible/licm_eligible are only true when there are no non-hoisted
loop-local LetStmt dependencies).

In `@testing/python/transform/test_tilelang_transform_licm.py`:
- Around line 476-478: In the __main__ block remove the leftover debug call to
test_print_result() and restore the proper test runner invocation: uncomment or
re-enable tilelang.testing.main() and ensure test_print_result() is not called
directly from main; update the block so it only calls tilelang.testing.main()
when __name__ == "__main__" to allow CI to run tests normally.

🧹 Nitpick comments (9)

src/transform/common/buffer_analysis.h (3)
16-16: Avoid using namespace in header files.

using namespace tir; in a header file pollutes the namespace for all translation units that include this header, which can lead to name collisions and unexpected behavior.
♻️ Proposed fix: Use explicit namespace qualification
-using namespace tir;
Then update the class definitions to use explicit tir:: prefixes, for example:

tir::StmtExprVisitor instead of StmtExprVisitor

tir::BufferStoreNode instead of BufferStoreNode

tir::CallNode instead of CallNode

etc.
26-27: Unnecessary std::move on return value.

Using std::move on a local variable being returned can prevent Named Return Value Optimization (NRVO). The compiler can optimize this better without the explicit move.
♻️ Proposed fix
   static std::unordered_set<const VarNode *> Collect(const Stmt &stmt) {
     WrittenBufferCollector collector;
     collector(stmt);
-    return std::move(collector.written_buffers);
+    return collector.written_buffers;
   }
44-48: Conservative treatment of address_of as write.

The address_of handling assumes any buffer whose address is taken is written. While this is safe, it may be overly conservative in cases where the address is only used for reading. Consider adding a comment documenting this intentional conservative behavior.
📝 Suggested documentation
     } else if (op->op.same_as(builtin::address_of())) {
+      // Conservative: treat address_of as potential write since we can't
+      // determine how the address will be used downstream
       if (const auto *load = op->args[0].as<BufferLoadNode>()) {
         written_buffers.insert(load->buffer->data.get());
       }
     }
testing/python/transform/test_tilelang_transform_licm.py (5)
61-64: Unused helper function.

_count_expr_occurrences is defined but never used in any test. Consider removing it or adding tests that utilize it.

88-89: Remove extraneous f prefixes from strings without placeholders.

These assertion messages don't contain any format placeholders, so the f prefix is unnecessary.
♻️ Proposed fix
-    assert "x" in outside, f"x should be outside loop"
-    assert "x" not in inside, f"x should not be inside loop"
+    assert "x" in outside, "x should be outside loop"
+    assert "x" not in inside, "x should not be inside loop"
This pattern applies to many other assertions in this file (lines 110-113, 131-132, 153-156, 175, 322, 325, 365-366, 438-439, 446).
173-175: Unused unpacked variable.

The outside variable is unpacked but never used. Use _ to indicate intentionally unused variables.
♻️ Proposed fix
-    outside, inside = _find_lets_in_stmt(_get_body(result))
+    _, inside = _find_lets_in_stmt(_get_body(result))
This pattern applies to several other tests where inside is unused (lines 199, 223, 246, 268, 290, 319, 445).
185-204: Unused function parameter C.

The C tensor parameter is declared but never used in this test function.
♻️ Proposed fix
     `@T.prim_func`
     def before(
         A: T.Tensor((128,), T.float32),
         B: T.Tensor((128,), T.float32),
-        C: T.Tensor((128,), T.float32),
         base: T.int32,
         offset: T.int32,
     ):
246-251: Unused local variable cse_vars.

The variable cse_vars is assigned but never used. The assertion only checks that result is not None, which doesn't verify the expected behavior.
♻️ Proposed fix: Either use cse_vars or remove it

Option 1 - Remove the unused variable:
     result = _apply_licm(before)
-    outside, inside = _find_lets_in_stmt(_get_body(result))
-
-    # Should NOT have extracted any CSE variable (each expr appears once)
-    cse_vars = [v for v in outside if v.startswith("cse_var")]
-    # This is a weaker test - we just check it doesn't crash
     assert result is not None
Option 2 - Add a meaningful assertion:
     result = _apply_licm(before)
-    outside, inside = _find_lets_in_stmt(_get_body(result))
+    outside, _ = _find_lets_in_stmt(_get_body(result))

     # Should NOT have extracted any CSE variable (each expr appears once)
     cse_vars = [v for v in outside if v.startswith("cse_var")]
-    # This is a weaker test - we just check it doesn't crash
-    assert result is not None
+    # With default min_occurrences_for_cse=2, single occurrence exprs won't be extracted
+    # (unless they meet the complexity threshold for LICM mode)
+    assert len(cse_vars) <= 1, f"Should have at most 1 CSE var for single-occurrence exprs, got {outside}"
src/transform/loop_invariant_code_motion.cc (1)
562-567: Make candidate ordering deterministic when complexities tie.
expr_counts comes from std::unordered_map, and sorting only by complexity leaves tie ordering dependent on hash iteration, which can make cse_var_* assignment nondeterministic. Consider adding a stable tie‑breaker (e.g., StructuralHash).
♻️ Suggested tie-breaker for deterministic ordering
-    std::stable_sort(candidates.begin(), candidates.end(),
-                     [](const auto &a, const auto &b) {
-                       return ExprComplexityCalculator::Calculate(a.first) >
-                              ExprComplexityCalculator::Calculate(b.first);
-                     });
+    std::stable_sort(candidates.begin(), candidates.end(),
+                     [](const auto &a, const auto &b) {
+                       size_t ca =
+                           ExprComplexityCalculator::Calculate(a.first);
+                       size_t cb =
+                           ExprComplexityCalculator::Calculate(b.first);
+                       if (ca != cb) return ca > cb;
+                       return StructuralHash()(a.first) <
+                              StructuralHash()(b.first);
+                     });

coderabbitai · 2026-02-02T10:07:41Z

src/transform/loop_invariant_code_motion.cc

+    for (const auto &[expr, count] : expr_counts) {
+      if (!checker.IsInvariant(expr)) {
+        continue;
+      }
+      size_t complexity = ExprComplexityCalculator::Calculate(expr);
+
+      // CSE criterion: appears multiple times
+      bool cse_eligible =
+          (static_cast<int>(count) >= config_->min_occurrences_for_cse &&
+           static_cast<int>(complexity) >= config_->min_complexity_for_cse);
+
+      // LICM criterion: complex enough to warrant hoisting even if appears once
+      bool licm_eligible =
+          (static_cast<int>(complexity) >= config_->min_complexity_for_licm);
+
+      if (cse_eligible || licm_eligible) {


⚠️ Potential issue | 🔴 Critical

Prevent hoisting expressions that depend on loop‑local LetStmt vars.
Phase 2 only checks the loop var + written buffers, so expressions that depend on non‑hoisted loop‑local let vars can be treated as invariant and hoisted outside the loop. That can introduce out‑of‑scope references or wrong results (e.g., let x = i; ... x + 1 ...). Please exclude non‑hoisted loop‑local vars from extraction.

🛠️ Proposed fix: track loop-local variant vars and block extraction

- auto [extracted_lets, final_body] = ExtractInvariantSubexpressions( - new_body, op->loop_var, written_buffers, invariance_checker); + std::unordered_set<const VarNode *> variant_vars; + variant_vars.reserve(let_stmts.size()); + std::unordered_set<const VarNode *> hoisted_vars; + hoisted_vars.reserve(hoisted_lets.size()); + for (const auto *let : hoisted_lets) { + hoisted_vars.insert(let->var.get()); + } + for (const auto *let : let_stmts) { + if (!hoisted_vars.count(let->var.get())) { + variant_vars.insert(let->var.get()); + } + } + auto [extracted_lets, final_body] = ExtractInvariantSubexpressions( + new_body, op->loop_var, written_buffers, invariance_checker, + variant_vars);

- ExtractInvariantSubexpressions( - const Stmt &body, const Var &loop_var, - const std::unordered_set<const VarNode *> &written_buffers, - const LoopInvarianceChecker &checker) { + ExtractInvariantSubexpressions( + const Stmt &body, const Var &loop_var, + const std::unordered_set<const VarNode *> &written_buffers, + const LoopInvarianceChecker &checker, + const std::unordered_set<const VarNode *> &variant_vars) { ... - if (!checker.IsInvariant(expr)) { + if (UsesVar(expr, [&variant_vars](const VarNode *v) { + return variant_vars.count(v); + })) { + continue; + } + if (!checker.IsInvariant(expr)) { continue; }

🤖 Prompt for AI Agents

In `@src/transform/loop_invariant_code_motion.cc` around lines 538 - 553, The current hoisting loop uses checker.IsInvariant(expr) and only loop var/written-buffers checks, so expressions that reference loop-local LetStmt vars can be incorrectly treated as invariant; modify the extraction criteria in the loop-over expr_counts (where expr and count are used, and complexity is computed via ExprComplexityCalculator::Calculate) to also detect any dependency on loop-local LetStmt variables and exclude such expressions unless those LetStmt vars themselves are marked hoisted; specifically, enhance checker.IsInvariant or add a predicate that walks expr to find LetStmt-created local symbols and rejects extraction if any of those locals are loop-local and not in the hoisted-set (i.e., ensure cse_eligible/licm_eligible are only true when there are no non-hoisted loop-local LetStmt dependencies).

coderabbitai · 2026-02-02T10:07:41Z

testing/python/transform/test_tilelang_transform_licm.py

+if __name__ == "__main__":
+    # tilelang.testing.main()
+    test_print_result()


⚠️ Potential issue | 🟡 Minor

Debug code left in main block.

The test runner tilelang.testing.main() is commented out and test_print_result() is called directly instead. This should be corrected for proper test execution in CI.

🐛 Proposed fix

if __name__ == "__main__": - # tilelang.testing.main() - test_print_result() + tilelang.testing.main()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if __name__ == "__main__":

# tilelang.testing.main()

test_print_result()

if __name__ == "__main__":

tilelang.testing.main()

🤖 Prompt for AI Agents

In `@testing/python/transform/test_tilelang_transform_licm.py` around lines 476 - 478, In the __main__ block remove the leftover debug call to test_print_result() and restore the proper test runner invocation: uncomment or re-enable tilelang.testing.main() and ensure test_print_result() is not called directly from main; update the block so it only calls tilelang.testing.main() when __name__ == "__main__" to allow CI to run tests normally.

… test_no_extract_single_occurrence - Simplified the condition check for BufferStore and Evaluate nodes using a tuple. - Added a print statement to output CSE variables in the test case for better debugging.

LeiWang1999 and others added 2 commits February 2, 2026 17:53

Merge branch 'main' of https://github.com/tile-ai/tilelang into feat/…

b72c769

…licm-optimization

Rename kEnableLICM to kEnableLoopInvariantCodeMotion

47495bf

Follow the naming convention of other pass configs in the codebase. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

LeiWang1999 added 2 commits February 2, 2026 18:07

lint fix

e818a64

Refactor condition check in _find_lets_in_stmt and add debug print in…

2ac7edd

… test_no_extract_single_occurrence - Simplified the condition check for BufferStore and Evaluate nodes using a tuple. - Added a print statement to output CSE variables in the test case for better debugging.

LeiWang1999 marked this pull request as draft February 3, 2026 04:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add Loop Invariant Code Motion (LICM) Pass #1771

[Feature] Add Loop Invariant Code Motion (LICM) Pass #1771

LeiWang1999 commented Feb 2, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Feb 2, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 2, 2026

Uh oh!

coderabbitai bot Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Feature] Add Loop Invariant Code Motion (LICM) Pass #1771

Are you sure you want to change the base?

[Feature] Add Loop Invariant Code Motion (LICM) Pass #1771

Conversation

LeiWang1999 commented Feb 2, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Example Transformation

Features

Files Changed

Test Plan

Summary by CodeRabbit

Uh oh!

github-actions bot commented Feb 2, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LeiWang1999 commented Feb 2, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 2, 2026 •

edited

Loading