-
Notifications
You must be signed in to change notification settings - Fork 429
[Docs] Add TileLang Semantics Guide #1745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughThis pull request adds a comprehensive TileLang semantics documentation guide. A new entry is added to the PROGRAMMING GUIDES section in the documentation index, and a detailed reference document is created explaining supported Python features, semantic behaviors, and limitations within TileLang kernels. Changes
Possibly Related PRs
Estimated Code Review Effort🎯 2 (Simple) | ⏱️ ~15 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/programming_guides/tilelang_semantics.md`:
- Around line 372-428: The statement claiming Triton "always uses
`truncdiv`/`truncmod`" is misleading; update the Note about Triton to clarify
that Triton's behavior is context-dependent—tensor operations use C-style
truncation (truncdiv/truncmod) while scalar operations use Python-style
floordiv/floormod—or restrict the comparison to tensor operations only; edit the
paragraph that starts "Note: Unlike Triton, which always uses
`truncdiv`/`truncmod`" to either remove the word "always" and add the tensor vs
scalar distinction or append a clarifying sentence indicating the comparison
applies to Triton tensor operations.
| ## Integer Division and Modulo | ||
|
|
||
| TileLang's `//` and `%` operators follow Python semantics (`floordiv`/`floormod`), | ||
| not C/C++ semantics. If you need C-style truncation behavior, use `T.truncdiv()` and | ||
| `T.truncmod()` explicitly. | ||
|
|
||
| **Note**: Unlike Triton, which always uses `truncdiv`/`truncmod` (C-style, inconsistent with | ||
| Python), TileLang preserves Python's expected behavior for `//` and `%`. | ||
|
|
||
| TileLang provides multiple division and modulo operations with different rounding | ||
| behaviors. Understanding these is important when working with negative numbers. | ||
|
|
||
| ### truncdiv / truncmod (C-style) | ||
|
|
||
| Rounds toward zero. The remainder has the same sign as the dividend. | ||
|
|
||
| ```python | ||
| T.truncdiv(-7, 2) # = -3 (toward zero: -3.5 → -3) | ||
| T.truncmod(-7, 2) # = -1 (since -7 = (-3) * 2 + (-1)) | ||
|
|
||
| T.truncdiv(7, -2) # = -3 | ||
| T.truncmod(7, -2) # = 1 (since 7 = (-3) * (-2) + 1) | ||
| ``` | ||
|
|
||
| ### floordiv / floormod (Python-style) | ||
|
|
||
| Rounds toward negative infinity. The remainder has the same sign as the divisor. | ||
|
|
||
| ```python | ||
| T.floordiv(-7, 2) # = -4 (toward -∞: -3.5 → -4) | ||
| T.floormod(-7, 2) # = 1 (since -7 = (-4) * 2 + 1) | ||
|
|
||
| T.floordiv(7, -2) # = -4 | ||
| T.floormod(7, -2) # = -1 (since 7 = (-4) * (-2) + (-1)) | ||
| ``` | ||
|
|
||
| ### Comparison Table | ||
|
|
||
| | a | b | truncdiv | truncmod | floordiv | floormod | | ||
| |---|---|----------|----------|----------|----------| | ||
| | 7 | 2 | 3 | 1 | 3 | 1 | | ||
| | -7 | 2 | -3 | -1 | -4 | 1 | | ||
| | 7 | -2 | -3 | 1 | -4 | -1 | | ||
| | -7 | -2 | 3 | -1 | 3 | -1 | | ||
|
|
||
| ### Default Behavior | ||
|
|
||
| - Python's `//` operator maps to `floordiv` | ||
| - Python's `%` operator maps to `floormod` | ||
| - `T.ceildiv(a, b)` computes ceiling division: `⌈a / b⌉` | ||
|
|
||
| ### When to Use Which | ||
|
|
||
| - **floordiv/floormod**: Default choice, matches Python semantics | ||
| - **truncdiv/truncmod**: When you need C/C++ compatible behavior | ||
| - **ceildiv**: For computing grid sizes: `T.ceildiv(N, BLOCK)` gives the number | ||
| of blocks needed to cover N elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
Triton programming language integer division modulo semantics C-style truncation
💡 Result:
In Triton, integer floor-division (//) and modulo (%) on tensors follow C semantics (not Python) for efficiency:
- Division (
a // b) truncates toward zero whenaandbcan have mixed signs (i.e., it does not floor toward-∞like Python). (triton-lang.org) - Modulo follows the C-style identity:
[
a % b = a - b \cdot (a // b)
]
so the remainder’s sign matches the trunc-toward-zero quotient behavior (often meaning the remainder has the same sign asa, as in C). (triton-lang.org)
Example (tensor values):
-3 // 2 == -1(Triton/C) vs-3 // 2 == -2(Python)-3 % 2 == -1(Triton/C) vs-3 % 2 == 1(Python) (triton-lang.org)
Exception / gotcha: If all inputs are scalars, Triton says // and % “perhaps confusingly” follow Python semantics. (triton-lang.org)
Citations:
- 1: https://triton-lang.org/main/python-api/triton-semantics.html
- 2: https://triton-lang.org/main/python-api/triton-semantics.html
- 3: https://triton-lang.org/main/python-api/triton-semantics.html
- 4: https://triton-lang.org/main/python-api/triton-semantics.html
The Triton comparison needs clarification.
While the division examples and comparison table are mathematically correct, the claim that Triton "always uses truncdiv/truncmod" is inaccurate. According to Triton's official documentation, Triton's division behavior is context-dependent:
- For tensor operations: Triton uses C-style truncation (toward zero)
- For scalar operations: Triton uses Python semantics (floor division)
The documentation should either remove "always" and clarify the tensor/scalar distinction, or note that the comparison applies specifically to tensor operations. This nuance is important for developers migrating from Triton.
🤖 Prompt for AI Agents
In `@docs/programming_guides/tilelang_semantics.md` around lines 372 - 428, The
statement claiming Triton "always uses `truncdiv`/`truncmod`" is misleading;
update the Note about Triton to clarify that Triton's behavior is
context-dependent—tensor operations use C-style truncation (truncdiv/truncmod)
while scalar operations use Python-style floordiv/floormod—or restrict the
comparison to tensor operations only; edit the paragraph that starts "Note:
Unlike Triton, which always uses `truncdiv`/`truncmod`" to either remove the
word "always" and add the tensor vs scalar distinction or append a clarifying
sentence indicating the comparison applies to Triton tensor operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a comprehensive TileLang Semantics guide to help users understand what Python syntax is supported inside @T.prim_func kernels and how to translate common Python patterns into TileLang equivalents.
Changes:
- Adds a new documentation file
docs/programming_guides/tilelang_semantics.mdcovering Python compatibility, control flow, data access, variables, functions, operators, and integer division/modulo semantics - Updates
docs/index.mdto include the new semantics guide in the programming guides section - Documents that TileLang follows Python semantics for
//and%operators (floordiv/floormod), unlike Triton which uses C-style truncation
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
docs/programming_guides/tilelang_semantics.md |
New comprehensive guide documenting supported Python features, control flow constructs, operators, and common patterns with examples |
docs/index.md |
Adds the new semantics guide to the documentation index under Programming Guides |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| |-------------------------|:---------:|------------------------------------------| | ||
| | `with` | ⚠️ | Only `T.Kernel`, `T.ws` | | ||
| | `import` | ❌ | Not inside kernel | | ||
| | `assert` | ⚠️ | Use `T.device_assert` or `T.assert` | |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation mentions T.assert but the actual API in the codebase is T.Assert (capitalized). This should be corrected to use the proper capitalization.
| | `assert` | ⚠️ | Use `T.device_assert` or `T.assert` | | |
| | `assert` | ⚠️ | Use `T.device_assert` or `T.Assert` | |
| N = A.shape[1] | ||
|
|
||
| # Type casting | ||
| x = value.astype("float32") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what's the canonical way to cast types? Buffer.astype mentioned here, or T.cast from TVM, or T.Cast defined in TileLang Python package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Elevator14B They are the same, just syntax sugar. Both T.cast and T.Cast are from TVM. And the difference between T.cast and T.Cast is just the order of arguments. I think the canonical way is to use T.cast or astype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the behavior of Buffer.astype? Return a new buffer, or casting the elements in-place? If it is the latter one, should the buffer size change if the bitwidth of src/dst types are different?
Instead of immutable tiles model in triton/cuTile, we allocate before writing to a buffer in Tilelang, which (for me) in many cases causes this kind of confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Da1sypetals It returns a new pointer to the original data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Elevator14B I submit a PR which unifies the cast-related ops: #1757. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Da1sypetals It returns a new pointer to the original data
So is it correct that it modifies the content of original buffer? Also, if the bitwidth are different (and thus space required to hold the same amount of element change), how can it point to the original buffer? What is the internal behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Da1sypetals 1. This instruction itself don't modifies the content of original buffer since it just returns the pointer. But we can use this pointer to read/write the data, which may modify the content. 2. The internal behavior is just it do a static cast, like:
buf1 = T.cast(buf, T.float16)
# Codegen result
float16* buf1 = static_cast<float16*>(buf);
You can write a simple TileLang program and use get_kernel_source() to get the .cu code to have a better understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Da1sypetals 1. This instruction itself don't modifies the content of original buffer since it just returns the pointer. But we can use this pointer to read/write the data, which may modify the content. 2. The internal behavior is just it do a static cast, like:
buf1 = T.cast(buf, T.float16) # Codegen result float16* buf1 = static_cast<float16*>(buf);You can write a simple TileLang program and use
get_kernel_source()to get the.cucode to have a better understanding.
So it is not casting by value like tensor.to(dtype) in PyTorch do, it is just casting the pointer?
In this case, if a buffer is casted to a dtype with different bitwidth(e.g. float32 -> float16, which in the same space holds 2x elements), how will the buffer's shape change? I think this should be documented.
| break | ||
| ``` | ||
|
|
||
| ### Max/Min Finding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe together with sum? Also, I think it's good to mention difference between var-based serial reduction and fragment-based parallel reduction, and alloc_reducer as another option for the latter.
|
For the "What (not) works" section, I think we should include more explanation for what's happening behind the interfaces and what's the constraints. E.g., fragments should be accessed with expressions consisting of |
@Elevator14B I have different opinions with you, IMO documenting every corner cases is way more important than explaining compiler/hardware internals and restrictions to users who most likely don't care about these. Maybe appendix is a good place for these info. |
|
Another missing piece: what's allowed in the |
I get your point. Maybe we need some compiler/hardware agnostic way to explain the semantics (not just the allowed lexical subset), so that users don't get confused with the internal details while still know what their written code is expected to behave. |
|
Is it required (in some cases) that the thrid argument of |
Summary
@T.prim_funckernels//and%, unlike Triton which uses C-style truncationTest plan
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.