Skip to content

Conversation

@LeiWang1999
Copy link
Member

This pull request includes updates to the README.md file to improve the documentation for tile-lang. The most important changes include the addition of a "Latest News" section, the removal of the basic GEMM example, and updates to the GEMM example with annotations.

Documentation updates:

  • Added a "Latest News" section to announce the open-source release of tile-lang.
  • Removed the basic GEMM example to streamline the documentation and focus on more advanced usage.
  • Updated the GEMM example with annotations to include optional layout optimizations and parallelized copy syntax.

@LeiWang1999 LeiWang1999 merged commit 9c578fa into main Jan 18, 2025
3 of 4 checks passed
@LeiWang1999 LeiWang1999 deleted the doc branch January 20, 2025 16:17
vincentccc pushed a commit to vincentccc/tilelang that referenced this pull request Jul 21, 2025
uv-xiao pushed a commit to uv-xiao/tilelang that referenced this pull request Jan 1, 2026
* [Feat] Add `copy_unrolled` operation for optimized memory copying

- Implemented a new built-in operation `copy_unrolled` to facilitate copying between global memory buffers with an unrolled loop.
- Updated the corresponding header and CUDA code generation to support the new operation.
- Added Python interface for `copy_unrolled` to enable usage in TileLang.

* bug fix

* [Feat] Implement CUDA IPC support for inter-process communication

- Added `ipc_ext` module for creating and synchronizing IPC handles using CUDA.
- Implemented `create_ipc_handle` and `sync_ipc_handles` functions for managing IPC memory.
- Developed a test script `test_ipc.py` to demonstrate IPC functionality across multiple GPUs.
- Included setup scripts for building the IPC extension and a CUDA kernel for setting values in shared memory.
- Updated README with usage instructions for the IPC test.

* [Feat] Introduce remote copy operation for distributed memory management

- Implemented a new `remote_copy` operation to facilitate efficient data transfer between global memory buffers in a distributed environment.
- Updated the TileLang Python interface to support the new operation, replacing the previous `copy_unrolled` implementation.
- Enhanced the `example_remote_copy.py` to demonstrate the usage of the `remote_copy` operation with distributed tensor management.
- Added IPC handle management functions to synchronize memory across processes.
- Refactored existing tests to align with the new remote copy functionality.

* [Refactor] Move `remote_copy` to a new module and update implementation

- Moved the `remote_copy` function from `builtin.py` to a new `common.py` module for better organization.
- Updated the `remote_copy` implementation to enhance clarity and maintainability.
- Adjusted the `Lower` method in `remote_copy.cc` to include `dst_offset` in the buffer load operation.
- Added an import statement for the new `common` module in the TileLang language initialization file.

* [Update] Sync subproject and enhance tensor buffer types

- Updated the TVM subproject to the latest commit for improved features and fixes.
- Modified `example_remote_copy.py` to specify buffer types for distributed tensors and metadata, enhancing clarity in tensor management.
- Adjusted `lower_tile_op.cc` to enforce a single metadata buffer and improved handling of distributed buffer types in the substitution process.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants