Skip to content

Conversation

@ShaobinChen-AH
Copy link

@ShaobinChen-AH ShaobinChen-AH commented Feb 3, 2026

Fixes #1666

Problem:
When using reduce_sum with clear=False, the function should accumulate results onto existing values in the output buffer. However, the code was creating a new temporary fragment buffer without copying existing values, causing accumulation to start from zero.

Root Cause:
In reduce_op.py, when is_shared(buffer) and is_shared(out), the code creates temporary fragment buffers but only copies from input buffer, not from output buffer when clear=False.

Fix:
Copy existing values from output buffer to temporary fragment buffer when clear=False, before performing the reduction.

Testing:

  • test_clear_issue.py now passes with Out=2048 (expected)
  • Both clear=True and clear=False cases work correctly

Summary by CodeRabbit

  • Bug Fixes
    • Reduction now correctly accumulates into an existing shared output buffer when the buffer isn't cleared before reduction.
  • Tests
    • Added tests covering non-clearing reductions with various shared-buffer configurations to validate correctness against a reference implementation.

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

Insert conditional guarded copies from output to reducer fragment in reduce macro when buffers are shared and the reduction is non-clearing (clear=False), ensuring existing output values are included before performing the reduction.

Changes

Cohort / File(s) Summary
Reduction implementation
tilelang/language/reduce_op.py
Add conditional copy(out, red_frag_out) in shared-buffer branches when clear=False, so prior output is incorporated before performing the reduction.
Tests — reduce variants
testing/python/language/test_tilelang_language_reduce.py
Add test kernel variants reduce_sum_test_clear_B_shared and reduce_sum_test_clear_AB_shared, new test functions, and make run_reduce_sum_clear accept tl_func to exercise non-clearing reductions with shared buffers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I hopped through lanes of shared memory bright,
Sniffed for numbers hiding out of sight.
A careful copy, then we add with care,
Now sums keep friends they used to wear. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main fix: addressing reduce_sum with clear=False not accumulating correctly.
Linked Issues check ✅ Passed The PR addresses the core coding requirement from issue #1666: ensuring existing output buffer values are copied into temporary fragments when clear=False, preventing incorrect resets during reduction operations.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the reduce_sum clear=False bug: modifications to reduce_op.py implementation and corresponding test additions in test_tilelang_language_reduce.py.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

  • 136.113.208.247/32 (new)
  • 34.170.211.100/32
  • 35.222.179.152/32

Failure to add the new IP will result in interrupted reviews.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tilelang/language/reduce_op.py (1)

80-93: ⚠️ Potential issue | 🟠 Major

Missing parallel fix for is_fragment(buffer) and is_shared(out) branch.

This branch also creates a temporary red_frag_out fragment but doesn't copy existing out values when clear=False, matching the original bug pattern. The same fix should be applied here for consistency.

🐛 Proposed fix
         elif is_fragment(buffer) and is_shared(out):
             red_frag_out = alloc_fragment(out.shape, out.dtype)
             IRBuilder.name(out.name + "_frag", red_frag_out)
 
+            if not clear:
+                copy(out, red_frag_out)
+
             tir.call_intrin(
                 "handle",
                 tir.op.Op.get(_REDUCE_OP_KEY),
🧹 Nitpick comments (1)
tilelang/language/customize.py (1)

56-66: Type hint object for dtype is overly permissive.

While the functional change is fine, using object as a type hint loses specificity. Consider using a more precise type union if TVM has a specific dtype class.

♻️ Suggested type hint
-def view(src: Buffer, shape: list[PrimExpr] | tuple[PrimExpr, ...] | None = None, dtype: str | object | None = None) -> Buffer:
+def view(src: Buffer, shape: list[PrimExpr] | tuple[PrimExpr, ...] | None = None, dtype: str | None = None) -> Buffer:

If TVM dtype objects need to be accepted, consider importing and using the specific type (e.g., tvm.DataType or similar) instead of object.

@ShaobinChen-AH
Copy link
Author

pre-commit.ci autofix

@SiriusNEO SiriusNEO self-requested a review February 3, 2026 06:39
@SiriusNEO SiriusNEO changed the title [BugFix] Fix reduce_sum with clear=False not accumulating correctly (Fixes #1666)Fix issue 1666 [BugFix] Fix reduce_sum with clear=False not accumulating correctly Feb 3, 2026
@SiriusNEO
Copy link
Collaborator

@ShaobinChen-AH Thanks! Could you add the corresponding tests and remove the annotations part (Likely some changes in: #1773)?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tilelang/language/reduce_op.py`:
- Around line 52-54: The fragment→shared path currently allocates red_frag_out
without preserving existing out when clear=False; fix it by pre-seeding
red_frag_out with the current out (i.e., call copy(out, red_frag_out)) in the
fragment→shared branch before red_frag_out is used/allocated so it mirrors the
shared→shared behavior; update the logic around the red_frag_out variable in
reduce_op.py so that when clear is False you perform copy(out, red_frag_out)
prior to further writes/aggregation.

@SiriusNEO
Copy link
Collaborator

@ShaobinChen-AH I added the testcases, you can pull the code and have a review :)

@ShaobinChen-AH
Copy link
Author

@ShaobinChen-AH Thanks! Could you add the corresponding tests and remove the annotations part (Likely some changes in: #1773)?

I have removed the annotations part

@ShaobinChen-AH
Copy link
Author

@ShaobinChen-AH I added the testcases, you can pull the code and have a review :)

ok, thanks, I will review it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] print statement will affect res

2 participants