Skip to content

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Feb 2, 2026

Summary

This PR adds a Loop Invariant Code Motion (LICM) optimization pass that performs two optimizations:

  1. LetStmt Hoisting: Moves existing loop-invariant LetStmt nodes outside the loop
  2. Invariant Subexpression Extraction: Extracts loop-invariant subexpressions (CSE-style) and hoists them outside the loop

Example Transformation

// Before
for (i = 0; i < n; i++) {
    A[(threadIdx.x >> 2) * 32 + i] = ...
    B[(threadIdx.x >> 2) * 32 + j] = ...
}

// After
cse_var = (threadIdx.x >> 2) * 32
for (i = 0; i < n; i++) {
    A[cse_var + i] = ...
    B[cse_var + j] = ...
}

Features

  • Configurable thresholds via PassContext:
    • min_occurrences_for_cse: Minimum occurrences for CSE extraction (default: 2)
    • min_complexity_for_cse: Minimum complexity for CSE extraction (default: 2)
    • min_complexity_for_licm: Minimum complexity for single-occurrence extraction (default: 3)
  • Enable/disable via tl.enable_licm config (default: disabled)
  • Handles pure built-in operations (shift_right, bitwise_and, etc.)
  • Bottom-up processing for nested loops

Files Changed

  • src/transform/loop_invariant_code_motion.cc - Main LICM pass implementation
  • src/transform/common/buffer_analysis.h - Reusable buffer analysis utilities
  • src/op/builtin.h / src/op/builtin.cc - Config registration
  • tilelang/transform/__init__.py - Python binding
  • tilelang/transform/pass_config.py - PassConfigKey entries
  • testing/python/transform/test_tilelang_transform_licm.py - Unit tests (17 tests)

Test Plan

  • All 17 unit tests pass
  • Build succeeds

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Loop Invariant Code Motion (LICM) optimization added to extract and hoist invariant computations from loops; configurable thresholds available and the pass can be enabled/disabled.
  • Integration

    • LICM applied in the device lowering pipeline before backend codegen.
  • Tests

    • Comprehensive test suite added to validate LICM and related extraction behavior.
  • Chores

    • Pass configuration keys added for LICM; deprecated one legacy pass config removed.

- Introduced new configuration options for disabling loop unswitching and enabling LICM.
- Implemented the LICM pass to hoist loop-invariant LetStmt nodes out of loops, reducing redundant computations.
- Updated pass configuration to include LICM parameters for fine-tuning optimization behavior.
- Enhanced the device code generation process to incorporate LICM, improving overall performance.

This addition enhances the optimization capabilities of the framework, allowing for more efficient code generation.
@github-actions
Copy link

github-actions bot commented Feb 2, 2026

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 2, 2026

📝 Walkthrough

Walkthrough

Adds a configurable Loop Invariant Code Motion (LICM) pass with CSE to the TileLang/TIR pipeline, new buffer-analysis utilities, Python bindings and pass config options, tests, and integration into the device codegen flow; also updates a TVM submodule pointer.

Changes

Cohort / File(s) Summary
Submodule Update
3rdparty/tvm
Updated TVM submodule pointer to a newer commit; no functional changes reported.
Pass Config & Attr
src/op/builtin.h, src/op/builtin.cc
Added new TL attribute key tl.enable_loop_invariant_code_motion and registered pass config option for enabling LICM.
Buffer Analysis Utilities
src/transform/common/buffer_analysis.h
New header providing WrittenBufferCollector and BufferReadChecker to detect written buffers and whether expressions read them (used for LICM safety).
LICM Pass Implementation
src/transform/loop_invariant_code_motion.cc
New LICM pass implementation: config/reflection types, expression complexity analysis, invariance checks, CSE/subexpression extraction, LetStmt hoisting, transformers and pass registration.
Tests
testing/python/transform/test_tilelang_transform_licm.py
New comprehensive tests covering LetStmt hoisting, invariant subexpression extraction/CSE, nested/parallel loops, and configuration-driven behavior.
Codegen Integration
tilelang/engine/lower.py
Inserted LoopInvariantCodeMotion into device codegen pipeline after HoistBroadcastValues and before backend codegen.
Python API & Pass Config
tilelang/transform/__init__.py, tilelang/transform/pass_config.py
Added public LoopInvariantCodeMotion() binding and new pass-config keys (enable flag and LICM thresholds); removed obsolete TIR_SIMPLIFY key.

Sequence Diagram(s)

sequenceDiagram
    participant Lower as Lowering
    participant Hoist as HoistBroadcastValues
    participant LICM as LoopInvariantCodeMotion
    participant Backend as BackendCodegen

    Lower->>Hoist: apply HoistBroadcastValues
    Hoist-->>Lower: transformed IRModule
    Lower->>LICM: apply LoopInvariantCodeMotion (if enabled)
    LICM-->>Lower: hoisted LetStmts & extracted CSE temporaries
    Lower->>Backend: call backend codegen (cuda/hip/...)
    Backend-->>Lower: generated device code
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested reviewers

  • chengyupku

Poem

🐰 I nudged the loops to let bindings hop free,

Hoisted the crumbs where eyes cannot see.
With buffers checked and repeats made small,
CSE and LICM answered the call.
Hare-brained speed—now faster for all! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 39.08% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title '[Feature] Add Loop Invariant Code Motion (LICM) Pass' directly and accurately describes the main change: introduction of a new LICM optimization pass. The title is concise, specific, and clearly reflects the primary objective of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

LeiWang1999 and others added 2 commits February 2, 2026 17:53
Follow the naming convention of other pass configs in the codebase.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/transform/loop_invariant_code_motion.cc`:
- Around line 538-553: The current hoisting loop uses checker.IsInvariant(expr)
and only loop var/written-buffers checks, so expressions that reference
loop-local LetStmt vars can be incorrectly treated as invariant; modify the
extraction criteria in the loop-over expr_counts (where expr and count are used,
and complexity is computed via ExprComplexityCalculator::Calculate) to also
detect any dependency on loop-local LetStmt variables and exclude such
expressions unless those LetStmt vars themselves are marked hoisted;
specifically, enhance checker.IsInvariant or add a predicate that walks expr to
find LetStmt-created local symbols and rejects extraction if any of those locals
are loop-local and not in the hoisted-set (i.e., ensure
cse_eligible/licm_eligible are only true when there are no non-hoisted
loop-local LetStmt dependencies).

In `@testing/python/transform/test_tilelang_transform_licm.py`:
- Around line 476-478: In the __main__ block remove the leftover debug call to
test_print_result() and restore the proper test runner invocation: uncomment or
re-enable tilelang.testing.main() and ensure test_print_result() is not called
directly from main; update the block so it only calls tilelang.testing.main()
when __name__ == "__main__" to allow CI to run tests normally.
🧹 Nitpick comments (9)
src/transform/common/buffer_analysis.h (3)

16-16: Avoid using namespace in header files.

using namespace tir; in a header file pollutes the namespace for all translation units that include this header, which can lead to name collisions and unexpected behavior.

♻️ Proposed fix: Use explicit namespace qualification
-using namespace tir;

Then update the class definitions to use explicit tir:: prefixes, for example:

  • tir::StmtExprVisitor instead of StmtExprVisitor
  • tir::BufferStoreNode instead of BufferStoreNode
  • tir::CallNode instead of CallNode
  • etc.

26-27: Unnecessary std::move on return value.

Using std::move on a local variable being returned can prevent Named Return Value Optimization (NRVO). The compiler can optimize this better without the explicit move.

♻️ Proposed fix
   static std::unordered_set<const VarNode *> Collect(const Stmt &stmt) {
     WrittenBufferCollector collector;
     collector(stmt);
-    return std::move(collector.written_buffers);
+    return collector.written_buffers;
   }

44-48: Conservative treatment of address_of as write.

The address_of handling assumes any buffer whose address is taken is written. While this is safe, it may be overly conservative in cases where the address is only used for reading. Consider adding a comment documenting this intentional conservative behavior.

📝 Suggested documentation
     } else if (op->op.same_as(builtin::address_of())) {
+      // Conservative: treat address_of as potential write since we can't
+      // determine how the address will be used downstream
       if (const auto *load = op->args[0].as<BufferLoadNode>()) {
         written_buffers.insert(load->buffer->data.get());
       }
     }
testing/python/transform/test_tilelang_transform_licm.py (5)

61-64: Unused helper function.

_count_expr_occurrences is defined but never used in any test. Consider removing it or adding tests that utilize it.


88-89: Remove extraneous f prefixes from strings without placeholders.

These assertion messages don't contain any format placeholders, so the f prefix is unnecessary.

♻️ Proposed fix
-    assert "x" in outside, f"x should be outside loop"
-    assert "x" not in inside, f"x should not be inside loop"
+    assert "x" in outside, "x should be outside loop"
+    assert "x" not in inside, "x should not be inside loop"

This pattern applies to many other assertions in this file (lines 110-113, 131-132, 153-156, 175, 322, 325, 365-366, 438-439, 446).


173-175: Unused unpacked variable.

The outside variable is unpacked but never used. Use _ to indicate intentionally unused variables.

♻️ Proposed fix
-    outside, inside = _find_lets_in_stmt(_get_body(result))
+    _, inside = _find_lets_in_stmt(_get_body(result))

This pattern applies to several other tests where inside is unused (lines 199, 223, 246, 268, 290, 319, 445).


185-204: Unused function parameter C.

The C tensor parameter is declared but never used in this test function.

♻️ Proposed fix
     `@T.prim_func`
     def before(
         A: T.Tensor((128,), T.float32),
         B: T.Tensor((128,), T.float32),
-        C: T.Tensor((128,), T.float32),
         base: T.int32,
         offset: T.int32,
     ):

246-251: Unused local variable cse_vars.

The variable cse_vars is assigned but never used. The assertion only checks that result is not None, which doesn't verify the expected behavior.

♻️ Proposed fix: Either use cse_vars or remove it

Option 1 - Remove the unused variable:

     result = _apply_licm(before)
-    outside, inside = _find_lets_in_stmt(_get_body(result))
-
-    # Should NOT have extracted any CSE variable (each expr appears once)
-    cse_vars = [v for v in outside if v.startswith("cse_var")]
-    # This is a weaker test - we just check it doesn't crash
     assert result is not None

Option 2 - Add a meaningful assertion:

     result = _apply_licm(before)
-    outside, inside = _find_lets_in_stmt(_get_body(result))
+    outside, _ = _find_lets_in_stmt(_get_body(result))

     # Should NOT have extracted any CSE variable (each expr appears once)
     cse_vars = [v for v in outside if v.startswith("cse_var")]
-    # This is a weaker test - we just check it doesn't crash
-    assert result is not None
+    # With default min_occurrences_for_cse=2, single occurrence exprs won't be extracted
+    # (unless they meet the complexity threshold for LICM mode)
+    assert len(cse_vars) <= 1, f"Should have at most 1 CSE var for single-occurrence exprs, got {outside}"
src/transform/loop_invariant_code_motion.cc (1)

562-567: Make candidate ordering deterministic when complexities tie.
expr_counts comes from std::unordered_map, and sorting only by complexity leaves tie ordering dependent on hash iteration, which can make cse_var_* assignment nondeterministic. Consider adding a stable tie‑breaker (e.g., StructuralHash).

♻️ Suggested tie-breaker for deterministic ordering
-    std::stable_sort(candidates.begin(), candidates.end(),
-                     [](const auto &a, const auto &b) {
-                       return ExprComplexityCalculator::Calculate(a.first) >
-                              ExprComplexityCalculator::Calculate(b.first);
-                     });
+    std::stable_sort(candidates.begin(), candidates.end(),
+                     [](const auto &a, const auto &b) {
+                       size_t ca =
+                           ExprComplexityCalculator::Calculate(a.first);
+                       size_t cb =
+                           ExprComplexityCalculator::Calculate(b.first);
+                       if (ca != cb) return ca > cb;
+                       return StructuralHash()(a.first) <
+                              StructuralHash()(b.first);
+                     });

Comment on lines +538 to +553
for (const auto &[expr, count] : expr_counts) {
if (!checker.IsInvariant(expr)) {
continue;
}
size_t complexity = ExprComplexityCalculator::Calculate(expr);

// CSE criterion: appears multiple times
bool cse_eligible =
(static_cast<int>(count) >= config_->min_occurrences_for_cse &&
static_cast<int>(complexity) >= config_->min_complexity_for_cse);

// LICM criterion: complex enough to warrant hoisting even if appears once
bool licm_eligible =
(static_cast<int>(complexity) >= config_->min_complexity_for_licm);

if (cse_eligible || licm_eligible) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Prevent hoisting expressions that depend on loop‑local LetStmt vars.
Phase 2 only checks the loop var + written buffers, so expressions that depend on non‑hoisted loop‑local let vars can be treated as invariant and hoisted outside the loop. That can introduce out‑of‑scope references or wrong results (e.g., let x = i; ... x + 1 ...). Please exclude non‑hoisted loop‑local vars from extraction.

🛠️ Proposed fix: track loop-local variant vars and block extraction
-    auto [extracted_lets, final_body] = ExtractInvariantSubexpressions(
-        new_body, op->loop_var, written_buffers, invariance_checker);
+    std::unordered_set<const VarNode *> variant_vars;
+    variant_vars.reserve(let_stmts.size());
+    std::unordered_set<const VarNode *> hoisted_vars;
+    hoisted_vars.reserve(hoisted_lets.size());
+    for (const auto *let : hoisted_lets) {
+      hoisted_vars.insert(let->var.get());
+    }
+    for (const auto *let : let_stmts) {
+      if (!hoisted_vars.count(let->var.get())) {
+        variant_vars.insert(let->var.get());
+      }
+    }
+    auto [extracted_lets, final_body] = ExtractInvariantSubexpressions(
+        new_body, op->loop_var, written_buffers, invariance_checker,
+        variant_vars);
-  ExtractInvariantSubexpressions(
-      const Stmt &body, const Var &loop_var,
-      const std::unordered_set<const VarNode *> &written_buffers,
-      const LoopInvarianceChecker &checker) {
+  ExtractInvariantSubexpressions(
+      const Stmt &body, const Var &loop_var,
+      const std::unordered_set<const VarNode *> &written_buffers,
+      const LoopInvarianceChecker &checker,
+      const std::unordered_set<const VarNode *> &variant_vars) {
...
-      if (!checker.IsInvariant(expr)) {
+      if (UsesVar(expr, [&variant_vars](const VarNode *v) {
+            return variant_vars.count(v);
+          })) {
+        continue;
+      }
+      if (!checker.IsInvariant(expr)) {
         continue;
       }
🤖 Prompt for AI Agents
In `@src/transform/loop_invariant_code_motion.cc` around lines 538 - 553, The
current hoisting loop uses checker.IsInvariant(expr) and only loop
var/written-buffers checks, so expressions that reference loop-local LetStmt
vars can be incorrectly treated as invariant; modify the extraction criteria in
the loop-over expr_counts (where expr and count are used, and complexity is
computed via ExprComplexityCalculator::Calculate) to also detect any dependency
on loop-local LetStmt variables and exclude such expressions unless those
LetStmt vars themselves are marked hoisted; specifically, enhance
checker.IsInvariant or add a predicate that walks expr to find LetStmt-created
local symbols and rejects extraction if any of those locals are loop-local and
not in the hoisted-set (i.e., ensure cse_eligible/licm_eligible are only true
when there are no non-hoisted loop-local LetStmt dependencies).

Comment on lines +476 to +478
if __name__ == "__main__":
# tilelang.testing.main()
test_print_result()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Debug code left in main block.

The test runner tilelang.testing.main() is commented out and test_print_result() is called directly instead. This should be corrected for proper test execution in CI.

🐛 Proposed fix
 if __name__ == "__main__":
-    # tilelang.testing.main()
-    test_print_result()
+    tilelang.testing.main()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if __name__ == "__main__":
# tilelang.testing.main()
test_print_result()
if __name__ == "__main__":
tilelang.testing.main()
🤖 Prompt for AI Agents
In `@testing/python/transform/test_tilelang_transform_licm.py` around lines 476 -
478, In the __main__ block remove the leftover debug call to test_print_result()
and restore the proper test runner invocation: uncomment or re-enable
tilelang.testing.main() and ensure test_print_result() is not called directly
from main; update the block so it only calls tilelang.testing.main() when
__name__ == "__main__" to allow CI to run tests normally.

… test_no_extract_single_occurrence

- Simplified the condition check for BufferStore and Evaluate nodes using a tuple.
- Added a print statement to output CSE variables in the test case for better debugging.
@LeiWang1999 LeiWang1999 marked this pull request as draft February 3, 2026 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant