Skip to content

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Jan 28, 2026

Summary

  • Add a comprehensive TileLang Semantics guide to help users understand the DSL syntax
  • Documents Python compatibility: what works and what doesn't in @T.prim_func kernels
  • Covers loops, conditionals, buffer operations, variables, functions, and operators
  • Includes detailed section on integer division/modulo semantics (floordiv vs truncdiv)
  • Highlights that TileLang follows Python semantics for // and %, unlike Triton which uses C-style truncation

Test plan

  • Documentation renders correctly in markdown
  • All code examples are syntactically correct
  • Links to other documentation pages are valid

Summary by CodeRabbit

  • Documentation
    • Added comprehensive TileLang semantics guide covering supported Python features, control flow, data access, functions, operators, and best practices, including code examples, error messages, and patterns for translating Python code.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

This pull request adds a comprehensive TileLang semantics documentation guide. A new entry is added to the PROGRAMMING GUIDES section in the documentation index, and a detailed reference document is created explaining supported Python features, semantic behaviors, and limitations within TileLang kernels.

Changes

Cohort / File(s) Summary
Documentation Index Update
docs/index.md
Added programming_guides/tilelang_semantics entry to PROGRAMMING GUIDES toctree
TileLang Semantics Guide
docs/programming_guides/tilelang_semantics.md
New comprehensive guide documenting supported Python constructs, control flow semantics, division/modulo behavior (truncdiv/truncmod, floordiv/floormod, ceildiv), buffer operations, operators, and patterns for translating Python code into TileLang; includes error references, examples of working vs. unsupported features (e.g., no \args, \\*kwargs, lambdas, classes, recursion), and migration guidance

Possibly Related PRs

  • [Doc] Minor documentation update #1410: Both PRs extend the PROGRAMMING GUIDES toctree in docs/index.md and add documentation pages under docs/programming_guides/, establishing parallel documentation structure expansions.

Estimated Code Review Effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Poem

🐰 A guide hops into place, semantics so clear,
TileLang's rules and patterns now dear,
No more confusion in Python's domain,
Just reference and wisdom—a knowledge campaign! 📚✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and clearly summarizes the main change: adding a comprehensive TileLang Semantics Guide to the documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docs/programming_guides/tilelang_semantics.md`:
- Around line 372-428: The statement claiming Triton "always uses
`truncdiv`/`truncmod`" is misleading; update the Note about Triton to clarify
that Triton's behavior is context-dependent—tensor operations use C-style
truncation (truncdiv/truncmod) while scalar operations use Python-style
floordiv/floormod—or restrict the comparison to tensor operations only; edit the
paragraph that starts "Note: Unlike Triton, which always uses
`truncdiv`/`truncmod`" to either remove the word "always" and add the tensor vs
scalar distinction or append a clarifying sentence indicating the comparison
applies to Triton tensor operations.

Comment on lines +372 to +428
## Integer Division and Modulo

TileLang's `//` and `%` operators follow Python semantics (`floordiv`/`floormod`),
not C/C++ semantics. If you need C-style truncation behavior, use `T.truncdiv()` and
`T.truncmod()` explicitly.

**Note**: Unlike Triton, which always uses `truncdiv`/`truncmod` (C-style, inconsistent with
Python), TileLang preserves Python's expected behavior for `//` and `%`.

TileLang provides multiple division and modulo operations with different rounding
behaviors. Understanding these is important when working with negative numbers.

### truncdiv / truncmod (C-style)

Rounds toward zero. The remainder has the same sign as the dividend.

```python
T.truncdiv(-7, 2) # = -3 (toward zero: -3.5 → -3)
T.truncmod(-7, 2) # = -1 (since -7 = (-3) * 2 + (-1))

T.truncdiv(7, -2) # = -3
T.truncmod(7, -2) # = 1 (since 7 = (-3) * (-2) + 1)
```

### floordiv / floormod (Python-style)

Rounds toward negative infinity. The remainder has the same sign as the divisor.

```python
T.floordiv(-7, 2) # = -4 (toward -∞: -3.5 → -4)
T.floormod(-7, 2) # = 1 (since -7 = (-4) * 2 + 1)

T.floordiv(7, -2) # = -4
T.floormod(7, -2) # = -1 (since 7 = (-4) * (-2) + (-1))
```

### Comparison Table

| a | b | truncdiv | truncmod | floordiv | floormod |
|---|---|----------|----------|----------|----------|
| 7 | 2 | 3 | 1 | 3 | 1 |
| -7 | 2 | -3 | -1 | -4 | 1 |
| 7 | -2 | -3 | 1 | -4 | -1 |
| -7 | -2 | 3 | -1 | 3 | -1 |

### Default Behavior

- Python's `//` operator maps to `floordiv`
- Python's `%` operator maps to `floormod`
- `T.ceildiv(a, b)` computes ceiling division: `⌈a / b⌉`

### When to Use Which

- **floordiv/floormod**: Default choice, matches Python semantics
- **truncdiv/truncmod**: When you need C/C++ compatible behavior
- **ceildiv**: For computing grid sizes: `T.ceildiv(N, BLOCK)` gives the number
of blocks needed to cover N elements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Triton programming language integer division modulo semantics C-style truncation

💡 Result:

In Triton, integer floor-division (//) and modulo (%) on tensors follow C semantics (not Python) for efficiency:

  • Division (a // b) truncates toward zero when a and b can have mixed signs (i.e., it does not floor toward -∞ like Python). (triton-lang.org)
  • Modulo follows the C-style identity:
    [
    a % b = a - b \cdot (a // b)
    ]
    so the remainder’s sign matches the trunc-toward-zero quotient behavior (often meaning the remainder has the same sign as a, as in C). (triton-lang.org)

Example (tensor values):

  • -3 // 2 == -1 (Triton/C) vs -3 // 2 == -2 (Python)
  • -3 % 2 == -1 (Triton/C) vs -3 % 2 == 1 (Python) (triton-lang.org)

Exception / gotcha: If all inputs are scalars, Triton says // and % “perhaps confusingly” follow Python semantics. (triton-lang.org)

Citations:


The Triton comparison needs clarification.

While the division examples and comparison table are mathematically correct, the claim that Triton "always uses truncdiv/truncmod" is inaccurate. According to Triton's official documentation, Triton's division behavior is context-dependent:

  • For tensor operations: Triton uses C-style truncation (toward zero)
  • For scalar operations: Triton uses Python semantics (floor division)

The documentation should either remove "always" and clarify the tensor/scalar distinction, or note that the comparison applies specifically to tensor operations. This nuance is important for developers migrating from Triton.

🤖 Prompt for AI Agents
In `@docs/programming_guides/tilelang_semantics.md` around lines 372 - 428, The
statement claiming Triton "always uses `truncdiv`/`truncmod`" is misleading;
update the Note about Triton to clarify that Triton's behavior is
context-dependent—tensor operations use C-style truncation (truncdiv/truncmod)
while scalar operations use Python-style floordiv/floormod—or restrict the
comparison to tensor operations only; edit the paragraph that starts "Note:
Unlike Triton, which always uses `truncdiv`/`truncmod`" to either remove the
word "always" and add the tensor vs scalar distinction or append a clarifying
sentence indicating the comparison applies to Triton tensor operations.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive TileLang Semantics guide to help users understand what Python syntax is supported inside @T.prim_func kernels and how to translate common Python patterns into TileLang equivalents.

Changes:

  • Adds a new documentation file docs/programming_guides/tilelang_semantics.md covering Python compatibility, control flow, data access, variables, functions, operators, and integer division/modulo semantics
  • Updates docs/index.md to include the new semantics guide in the programming guides section
  • Documents that TileLang follows Python semantics for // and % operators (floordiv/floormod), unlike Triton which uses C-style truncation

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
docs/programming_guides/tilelang_semantics.md New comprehensive guide documenting supported Python features, control flow constructs, operators, and common patterns with examples
docs/index.md Adds the new semantics guide to the documentation index under Programming Guides

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

|-------------------------|:---------:|------------------------------------------|
| `with` | ⚠️ | Only `T.Kernel`, `T.ws` |
| `import` | ❌ | Not inside kernel |
| `assert` | ⚠️ | Use `T.device_assert` or `T.assert` |
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation mentions T.assert but the actual API in the codebase is T.Assert (capitalized). This should be corrected to use the proper capitalization.

Suggested change
| `assert` | ⚠️ | Use `T.device_assert` or `T.assert` |
| `assert` | ⚠️ | Use `T.device_assert` or `T.Assert` |

Copilot uses AI. Check for mistakes.
N = A.shape[1]

# Type casting
x = value.astype("float32")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's the canonical way to cast types? Buffer.astype mentioned here, or T.cast from TVM, or T.Cast defined in TileLang Python package?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Elevator14B They are the same, just syntax sugar. Both T.cast and T.Cast are from TVM. And the difference between T.cast and T.Cast is just the order of arguments. I think the canonical way is to use T.cast or astype.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior of Buffer.astype? Return a new buffer, or casting the elements in-place? If it is the latter one, should the buffer size change if the bitwidth of src/dst types are different?

Instead of immutable tiles model in triton/cuTile, we allocate before writing to a buffer in Tilelang, which (for me) in many cases causes this kind of confusion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Da1sypetals It returns a new pointer to the original data

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Elevator14B I submit a PR which unifies the cast-related ops: #1757. Any suggestions?

Copy link

@Da1sypetals Da1sypetals Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Da1sypetals It returns a new pointer to the original data

So is it correct that it modifies the content of original buffer? Also, if the bitwidth are different (and thus space required to hold the same amount of element change), how can it point to the original buffer? What is the internal behavior?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Da1sypetals 1. This instruction itself don't modifies the content of original buffer since it just returns the pointer. But we can use this pointer to read/write the data, which may modify the content. 2. The internal behavior is just it do a static cast, like:

buf1 = T.cast(buf, T.float16)
# Codegen result
float16* buf1 = static_cast<float16*>(buf);

You can write a simple TileLang program and use get_kernel_source() to get the .cu code to have a better understanding.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Da1sypetals 1. This instruction itself don't modifies the content of original buffer since it just returns the pointer. But we can use this pointer to read/write the data, which may modify the content. 2. The internal behavior is just it do a static cast, like:

buf1 = T.cast(buf, T.float16)
# Codegen result
float16* buf1 = static_cast<float16*>(buf);

You can write a simple TileLang program and use get_kernel_source() to get the .cu code to have a better understanding.

So it is not casting by value like tensor.to(dtype) in PyTorch do, it is just casting the pointer?

In this case, if a buffer is casted to a dtype with different bitwidth(e.g. float32 -> float16, which in the same space holds 2x elements), how will the buffer's shape change? I think this should be documented.

break
```

### Max/Min Finding
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe together with sum? Also, I think it's good to mention difference between var-based serial reduction and fragment-based parallel reduction, and alloc_reducer as another option for the latter.

@Elevator14B
Copy link
Collaborator

For the "What (not) works" section, I think we should include more explanation for what's happening behind the interfaces and what's the constraints. E.g., fragments should be accessed with expressions consisting of T.Parallel indices (but with a few exceptions as syntax sugar) so that it can be mapped to some thread during layout inference, T.reduce_max/min/sum do parallel reduction inside a block, etc.

@Da1sypetals
Copy link

Da1sypetals commented Jan 29, 2026

For the "What (not) works" section, I think we should include more explanation for what's happening behind the interfaces and what's the constraints. E.g., fragments should be accessed with expressions consisting of T.Parallel indices (but with a few exceptions as syntax sugar) so that it can be mapped to some thread during layout inference, T.reduce_max/min/sum do parallel reduction inside a block, etc.

@Elevator14B I have different opinions with you, IMO documenting every corner cases is way more important than explaining compiler/hardware internals and restrictions to users who most likely don't care about these. Maybe appendix is a good place for these info.

@Elevator14B
Copy link
Collaborator

Elevator14B commented Jan 30, 2026

Another missing piece: what's allowed in the @T.prim_funced function signature? This includes T.Tensor and equivalently T.Buffer, their two flavours of marking shape and dtype ([shape, dtype] and (shape, dtype)); T.ptr (together with its usage); scalars like T.float32, etc.

@Elevator14B
Copy link
Collaborator

For the "What (not) works" section, I think we should include more explanation for what's happening behind the interfaces and what's the constraints. E.g., fragments should be accessed with expressions consisting of T.Parallel indices (but with a few exceptions as syntax sugar) so that it can be mapped to some thread during layout inference, T.reduce_max/min/sum do parallel reduction inside a block, etc.

@Elevator14B I have different opinions with you, IMO documenting every corner cases is way more important than explaining compiler/hardware internals and restrictions to users who most likely don't care about these. Maybe appendix is a good place for these info.

I get your point. Maybe we need some compiler/hardware agnostic way to explain the semantics (not just the allowed lexical subset), so that users don't get confused with the internal details while still know what their written code is expected to behave.

@Da1sypetals
Copy link

Is it required (in some cases) that the thrid argument of T.gemm, the accumulator, must be on fragment? If so, it should be documented. Sometimes I encounter this error:

  File "/opt/conda/lib/python3.11/site-packages/tilelang/tileop/gemm/__init__.py", line 23, in gemm_py_infer_layout
    return gemm_py.infer_layout(target, thread_nums)

  File "/opt/conda/lib/python3.11/site-packages/tilelang/tileop/gemm/__init__.py", line 142, in infer_layout
    return impl_class(self).infer_layout(target, thread_nums)

  File "/opt/conda/lib/python3.11/site-packages/tilelang/tileop/gemm/gemm_wgmma.py", line 42, in infer_layout
    self.C: mma_emitter.make_mma_store_layout(self.C),

  File "/opt/conda/lib/python3.11/site-packages/tilelang/intrinsics/wgmma_macro_generator.py", line 581, in make_mma_store_layout
    assert is_fragment(local_buf), "local_buf must be a fragment"

AssertionError: local_buf must be a fragment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants