[Docs] Add TileLang Semantics Guide #1745

LeiWang1999 · 2026-01-28T06:08:09Z

Summary

Add a comprehensive TileLang Semantics guide to help users understand the DSL syntax
Documents Python compatibility: what works and what doesn't in @T.prim_func kernels
Covers loops, conditionals, buffer operations, variables, functions, and operators
Includes detailed section on integer division/modulo semantics (floordiv vs truncdiv)
Highlights that TileLang follows Python semantics for // and %, unlike Triton which uses C-style truncation

Test plan

Documentation renders correctly in markdown
All code examples are syntactically correct
Links to other documentation pages are valid

Summary by CodeRabbit

Documentation
- Added comprehensive TileLang semantics guide covering supported Python features, control flow, data access, functions, operators, and best practices, including code examples, error messages, and patterns for translating Python code.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…tation

github-actions · 2026-01-28T06:08:18Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-01-28T06:08:36Z

📝 Walkthrough

Walkthrough

This pull request adds a comprehensive TileLang semantics documentation guide. A new entry is added to the PROGRAMMING GUIDES section in the documentation index, and a detailed reference document is created explaining supported Python features, semantic behaviors, and limitations within TileLang kernels.

Changes

Cohort / File(s)	Summary
Documentation Index Update `docs/index.md`	Added `programming_guides/tilelang_semantics` entry to PROGRAMMING GUIDES toctree
TileLang Semantics Guide `docs/programming_guides/tilelang_semantics.md`	New comprehensive guide documenting supported Python constructs, control flow semantics, division/modulo behavior (truncdiv/truncmod, floordiv/floormod, ceildiv), buffer operations, operators, and patterns for translating Python code into TileLang; includes error references, examples of working vs. unsupported features (e.g., no \args, \\*kwargs, lambdas, classes, recursion), and migration guidance

Possibly Related PRs

[Doc] Minor documentation update #1410: Both PRs extend the PROGRAMMING GUIDES toctree in docs/index.md and add documentation pages under docs/programming_guides/, establishing parallel documentation structure expansions.

Estimated Code Review Effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Poem

🐰 A guide hops into place, semantics so clear,
TileLang's rules and patterns now dear,
No more confusion in Python's domain,
Just reference and wisdom—a knowledge campaign! 📚✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and clearly summarizes the main change: adding a comprehensive TileLang Semantics Guide to the documentation.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@docs/programming_guides/tilelang_semantics.md`:
- Around line 372-428: The statement claiming Triton "always uses
`truncdiv`/`truncmod`" is misleading; update the Note about Triton to clarify
that Triton's behavior is context-dependent—tensor operations use C-style
truncation (truncdiv/truncmod) while scalar operations use Python-style
floordiv/floormod—or restrict the comparison to tensor operations only; edit the
paragraph that starts "Note: Unlike Triton, which always uses
`truncdiv`/`truncmod`" to either remove the word "always" and add the tensor vs
scalar distinction or append a clarifying sentence indicating the comparison
applies to Triton tensor operations.

coderabbitai · 2026-01-28T06:12:20Z

docs/programming_guides/tilelang_semantics.md

+## Integer Division and Modulo
+
+TileLang's `//` and `%` operators follow Python semantics (`floordiv`/`floormod`),
+not C/C++ semantics. If you need C-style truncation behavior, use `T.truncdiv()` and
+`T.truncmod()` explicitly.
+
+**Note**: Unlike Triton, which always uses `truncdiv`/`truncmod` (C-style, inconsistent with
+Python), TileLang preserves Python's expected behavior for `//` and `%`.
+
+TileLang provides multiple division and modulo operations with different rounding
+behaviors. Understanding these is important when working with negative numbers.
+
+### truncdiv / truncmod (C-style)
+
+Rounds toward zero. The remainder has the same sign as the dividend.
+
+```python
+T.truncdiv(-7, 2)   # = -3  (toward zero: -3.5 → -3)
+T.truncmod(-7, 2)   # = -1  (since -7 = (-3) * 2 + (-1))
+
+T.truncdiv(7, -2)   # = -3
+T.truncmod(7, -2)   # = 1   (since 7 = (-3) * (-2) + 1)
+```
+
+### floordiv / floormod (Python-style)
+
+Rounds toward negative infinity. The remainder has the same sign as the divisor.
+
+```python
+T.floordiv(-7, 2)   # = -4  (toward -∞: -3.5 → -4)
+T.floormod(-7, 2)   # = 1   (since -7 = (-4) * 2 + 1)
+
+T.floordiv(7, -2)   # = -4
+T.floormod(7, -2)   # = -1  (since 7 = (-4) * (-2) + (-1))
+```
+
+### Comparison Table
+
+| a | b | truncdiv | truncmod | floordiv | floormod |
+|---|---|----------|----------|----------|----------|
+| 7 | 2 | 3 | 1 | 3 | 1 |
+| -7 | 2 | -3 | -1 | -4 | 1 |
+| 7 | -2 | -3 | 1 | -4 | -1 |
+| -7 | -2 | 3 | -1 | 3 | -1 |
+
+### Default Behavior
+
+- Python's `//` operator maps to `floordiv`
+- Python's `%` operator maps to `floormod`
+- `T.ceildiv(a, b)` computes ceiling division: `⌈a / b⌉`
+
+### When to Use Which
+
+- **floordiv/floormod**: Default choice, matches Python semantics
+- **truncdiv/truncmod**: When you need C/C++ compatible behavior
+- **ceildiv**: For computing grid sizes: `T.ceildiv(N, BLOCK)` gives the number
+  of blocks needed to cover N elements


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Triton programming language integer division modulo semantics C-style truncation

💡 Result:

In Triton, integer floor-division (//) and modulo (%) on tensors follow C semantics (not Python) for efficiency:

Division (a // b) truncates toward zero when a and b can have mixed signs (i.e., it does not floor toward -∞ like Python). (triton-lang.org)

Modulo follows the C-style identity:
[
a % b = a - b \cdot (a // b)
]
so the remainder’s sign matches the trunc-toward-zero quotient behavior (often meaning the remainder has the same sign as a, as in C). (triton-lang.org)

Example (tensor values):

-3 // 2 == -1 (Triton/C) vs -3 // 2 == -2 (Python)

-3 % 2 == -1 (Triton/C) vs -3 % 2 == 1 (Python) (triton-lang.org)

Exception / gotcha: If all inputs are scalars, Triton says // and % “perhaps confusingly” follow Python semantics. (triton-lang.org)

Citations:

1: https://triton-lang.org/main/python-api/triton-semantics.html

2: https://triton-lang.org/main/python-api/triton-semantics.html

3: https://triton-lang.org/main/python-api/triton-semantics.html

4: https://triton-lang.org/main/python-api/triton-semantics.html

The Triton comparison needs clarification.

While the division examples and comparison table are mathematically correct, the claim that Triton "always uses truncdiv/truncmod" is inaccurate. According to Triton's official documentation, Triton's division behavior is context-dependent:

For tensor operations: Triton uses C-style truncation (toward zero)

For scalar operations: Triton uses Python semantics (floor division)

The documentation should either remove "always" and clarify the tensor/scalar distinction, or note that the comparison applies specifically to tensor operations. This nuance is important for developers migrating from Triton.

🤖 Prompt for AI Agents

In `@docs/programming_guides/tilelang_semantics.md` around lines 372 - 428, The statement claiming Triton "always uses `truncdiv`/`truncmod`" is misleading; update the Note about Triton to clarify that Triton's behavior is context-dependent—tensor operations use C-style truncation (truncdiv/truncmod) while scalar operations use Python-style floordiv/floormod—or restrict the comparison to tensor operations only; edit the paragraph that starts "Note: Unlike Triton, which always uses `truncdiv`/`truncmod`" to either remove the word "always" and add the tensor vs scalar distinction or append a clarifying sentence indicating the comparison applies to Triton tensor operations.

Copilot

Pull request overview

This PR adds a comprehensive TileLang Semantics guide to help users understand what Python syntax is supported inside @T.prim_func kernels and how to translate common Python patterns into TileLang equivalents.

Changes:

Adds a new documentation file docs/programming_guides/tilelang_semantics.md covering Python compatibility, control flow, data access, variables, functions, operators, and integer division/modulo semantics
Updates docs/index.md to include the new semantics guide in the programming guides section
Documents that TileLang follows Python semantics for // and % operators (floordiv/floormod), unlike Triton which uses C-style truncation

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`docs/programming_guides/tilelang_semantics.md`	New comprehensive guide documenting supported Python features, control flow constructs, operators, and common patterns with examples
`docs/index.md`	Adds the new semantics guide to the documentation index under Programming Guides

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-28T07:50:36Z

docs/programming_guides/tilelang_semantics.md

+|-------------------------|:---------:|------------------------------------------|
+| `with`                  | ⚠️        | Only `T.Kernel`, `T.ws`                  |
+| `import`                | ❌        | Not inside kernel                        |
+| `assert`                | ⚠️        | Use `T.device_assert` or `T.assert`      |


Elevator14B · 2026-01-29T02:04:29Z

docs/programming_guides/tilelang_semantics.md

+N = A.shape[1]
+
+# Type casting
+x = value.astype("float32")


So what's the canonical way to cast types? Buffer.astype mentioned here, or T.cast from TVM, or T.Cast defined in TileLang Python package?

@Elevator14B They are the same, just syntax sugar. Both T.cast and T.Cast are from TVM. And the difference between T.cast and T.Cast is just the order of arguments. I think the canonical way is to use T.cast or astype.

What is the behavior of Buffer.astype? Return a new buffer, or casting the elements in-place? If it is the latter one, should the buffer size change if the bitwidth of src/dst types are different?

Instead of immutable tiles model in triton/cuTile, we allocate before writing to a buffer in Tilelang, which (for me) in many cases causes this kind of confusion.

@Da1sypetals It returns a new pointer to the original data

@Elevator14B I submit a PR which unifies the cast-related ops: #1757. Any suggestions?

@Da1sypetals It returns a new pointer to the original data

So is it correct that it modifies the content of original buffer? Also, if the bitwidth are different (and thus space required to hold the same amount of element change), how can it point to the original buffer? What is the internal behavior?

@Da1sypetals 1. This instruction itself don't modifies the content of original buffer since it just returns the pointer. But we can use this pointer to read/write the data, which may modify the content. 2. The internal behavior is just it do a static cast, like:

buf1 = T.cast(buf, T.float16) # Codegen result float16* buf1 = static_cast<float16*>(buf);

You can write a simple TileLang program and use get_kernel_source() to get the .cu code to have a better understanding.

@Da1sypetals 1. This instruction itself don't modifies the content of original buffer since it just returns the pointer. But we can use this pointer to read/write the data, which may modify the content. 2. The internal behavior is just it do a static cast, like:

buf1 = T.cast(buf, T.float16) # Codegen result float16* buf1 = static_cast<float16*>(buf);

You can write a simple TileLang program and use get_kernel_source() to get the .cu code to have a better understanding.

So it is not casting by value like tensor.to(dtype) in PyTorch do, it is just casting the pointer?

In this case, if a buffer is casted to a dtype with different bitwidth(e.g. float32 -> float16, which in the same space holds 2x elements), how will the buffer's shape change? I think this should be documented.

Elevator14B · 2026-01-29T02:07:42Z

docs/programming_guides/tilelang_semantics.md

+        break
+```
+
+### Max/Min Finding


Maybe together with sum? Also, I think it's good to mention difference between var-based serial reduction and fragment-based parallel reduction, and alloc_reducer as another option for the latter.

Elevator14B · 2026-01-29T02:11:54Z

For the "What (not) works" section, I think we should include more explanation for what's happening behind the interfaces and what's the constraints. E.g., fragments should be accessed with expressions consisting of T.Parallel indices (but with a few exceptions as syntax sugar) so that it can be mapped to some thread during layout inference, T.reduce_max/min/sum do parallel reduction inside a block, etc.

Da1sypetals · 2026-01-29T09:25:35Z

For the "What (not) works" section, I think we should include more explanation for what's happening behind the interfaces and what's the constraints. E.g., fragments should be accessed with expressions consisting of T.Parallel indices (but with a few exceptions as syntax sugar) so that it can be mapped to some thread during layout inference, T.reduce_max/min/sum do parallel reduction inside a block, etc.

@Elevator14B I have different opinions with you, IMO documenting every corner cases is way more important than explaining compiler/hardware internals and restrictions to users who most likely don't care about these. Maybe appendix is a good place for these info.

Elevator14B · 2026-01-30T09:39:51Z

Another missing piece: what's allowed in the @T.prim_funced function signature? This includes T.Tensor and equivalently T.Buffer, their two flavours of marking shape and dtype ([shape, dtype] and (shape, dtype)); T.ptr (together with its usage); scalars like T.float32, etc.

Elevator14B · 2026-01-30T09:43:27Z

For the "What (not) works" section, I think we should include more explanation for what's happening behind the interfaces and what's the constraints. E.g., fragments should be accessed with expressions consisting of T.Parallel indices (but with a few exceptions as syntax sugar) so that it can be mapped to some thread during layout inference, T.reduce_max/min/sum do parallel reduction inside a block, etc.

@Elevator14B I have different opinions with you, IMO documenting every corner cases is way more important than explaining compiler/hardware internals and restrictions to users who most likely don't care about these. Maybe appendix is a good place for these info.

I get your point. Maybe we need some compiler/hardware agnostic way to explain the semantics (not just the allowed lexical subset), so that users don't get confused with the internal details while still know what their written code is expected to behave.

Da1sypetals · 2026-02-02T07:21:44Z

Is it required (in some cases) that the thrid argument of T.gemm, the accumulator, must be on fragment? If so, it should be documented. Sometimes I encounter this error:

  File "/opt/conda/lib/python3.11/site-packages/tilelang/tileop/gemm/__init__.py", line 23, in gemm_py_infer_layout
    return gemm_py.infer_layout(target, thread_nums)

  File "/opt/conda/lib/python3.11/site-packages/tilelang/tileop/gemm/__init__.py", line 142, in infer_layout
    return impl_class(self).infer_layout(target, thread_nums)

  File "/opt/conda/lib/python3.11/site-packages/tilelang/tileop/gemm/gemm_wgmma.py", line 42, in infer_layout
    self.C: mma_emitter.make_mma_store_layout(self.C),

  File "/opt/conda/lib/python3.11/site-packages/tilelang/intrinsics/wgmma_macro_generator.py", line 581, in make_mma_store_layout
    assert is_fragment(local_buf), "local_buf must be a fragment"

AssertionError: local_buf must be a fragment

Add tilelang semantics guide to programming guides section in documen…

73ac881

…tation

coderabbitai bot reviewed Jan 28, 2026

View reviewed changes

LeiWang1999 mentioned this pull request Jan 28, 2026

[Question] Add language specs #1712

Open

2 tasks

LeiWang1999 requested a review from Copilot January 28, 2026 07:45

Copilot started reviewing on behalf of LeiWang1999 January 28, 2026 07:46 View session

Copilot AI reviewed Jan 28, 2026

View reviewed changes

Elevator14B reviewed Jan 29, 2026

View reviewed changes

	\| `assert` \| ⚠️ \| Use `T.device_assert` or `T.assert` \|
	\| `assert` \| ⚠️ \| Use `T.device_assert` or `T.Assert` \|

[Docs] Add TileLang Semantics Guide #1745

Are you sure you want to change the base?

[Docs] Add TileLang Semantics Guide #1745

Conversation

LeiWang1999 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

github-actions bot commented Jan 28, 2026

Uh oh!

coderabbitai bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly Related PRs

Estimated Code Review Effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Da1sypetals Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Elevator14B commented Jan 29, 2026

Uh oh!

Da1sypetals commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Elevator14B commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Elevator14B commented Jan 30, 2026

Uh oh!

Da1sypetals commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

LeiWang1999 commented Jan 28, 2026 •

edited

Loading

coderabbitai bot commented Jan 28, 2026 •

edited

Loading

Da1sypetals Jan 29, 2026 •

edited

Loading

Da1sypetals commented Jan 29, 2026 •

edited

Loading

Elevator14B commented Jan 30, 2026 •

edited

Loading