Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 108 additions & 0 deletions docs/2026-03-07-tpush-tpop-op-interface-todo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# PTOAS 新增 OP 接口与 TODO

## 1. 范围与目标

说明在PTOAS中新增的 OP 接口定义与参数语义,包括:

- `pto.initialize_pipe`
- `pto.tpush(tile, pipe)`
- `pto.tpop(tile, pipe)`
- `pto.tfree(pipe)`

并补充当前待完成TODO项。

## 2. OP 接口定义

### 2.1 `pto.initialize_pipe`

用途:在函数级完成 ring buffer/pipe 句柄初始化,返回统一 `pipe` 句柄,供后续 `tpush/tpop` 显式传递。

概念签名:

```mlir
%pipe = pto.initialize_pipe {dir_mask = <i8>, slot_size = <i32>}
(%gm_slot_buffer : <PTODpsType>,
%c2v_consumer_buf : i32,
%v2c_consumer_buf : i32)
-> !pto.pipe<SrcTileTy, DstTileTy>
```

参数说明:

| 参数 | 类型 | 说明 | 约束 |
|---|---|---|---|
| `dir_mask` | `i8 attr` | 方向掩码,`1`=C2V,`2`=V2C,`3`=双向 | 当前 PTOAS 中 `3` 暂不支持,直接报错 |
| `slot_size` | `i32 attr` | 单 slot 大小(字节) | 必须 `> 0` |
| `gm_slot_buffer` | `PTODpsType` | GM slot buffer 基址/句柄 | 必填 |
| `c2v_consumer_buf` | `i32` | A5 下 C2V consumer 侧本地 buffer 基址 | 必填 |
| `v2c_consumer_buf` | `i32` | A5 下 V2C consumer 侧本地 buffer 基址 | 必填 |
| 返回值 `pipe` | `!pto.pipe<SrcTileTy, DstTileTy>` | 统一 pipe 句柄 | `location/depth/numBuffers` 由 `initialize_pipe` 参数推导 |

### 2.2 `pto.tpush(tile, pipe)`

用途:生产者将 tile 按 `pipe` 描述写入 ring buffer。

概念签名:

```mlir
pto.tpush(%tile, %pipe : <TileTy>, !pto.pipe<SrcTileTy, DstTileTy>)
```

参数说明:

| 参数 | 类型 | 说明 | 约束 |
|---|---|---|---|
| `tile` | `PTODpsType` | 生产者要推送的 tile 变量 | `tile` 类型必须匹配 `pipe.src_tile_type` |
| `pipe` | `!pto.pipe<SrcTileTy, DstTileTy>` | `initialize_pipe` 返回的 pipe 句柄 | 必须显式传入,不能隐式推导 |

### 2.3 `pto.tpop(tile, pipe)`

用途:消费者从 `pipe` 读取 tile 数据到目标 `tile` 变量。

概念签名:

```mlir
pto.tpop(%tile, %pipe : <TileTy>, !pto.pipe<SrcTileTy, DstTileTy>)
```

参数说明:

| 参数 | 类型 | 说明 | 约束 |
|---|---|---|---|
| `tile` | `PTODpsType` | 消费者接收数据的 tile 变量 | `tile` 类型必须匹配 `pipe.dst_tile_type` |
| `pipe` | `!pto.pipe<SrcTileTy, DstTileTy>` | `initialize_pipe` 返回的 pipe 句柄 | 必须显式传入,不能隐式推导 |

语义备注:

- `tpop` 只负责”获取 slot + 读取数据”。
- slot 释放由独立的 `pto.tfree` 完成(仅 A5 架构需要,见 2.4)。

### 2.4 `pto.tfree(pipe)`

用途:显式释放 `tpop` 占用的 pipe slot。仅 A5 架构需要——A5 使用 Local buffer 作为 push/pop 数据传递介质,`tpop` 后数据仍在 slot 中供后续计算读取,必须等消费者用完后才能释放。A2A3 使用 Global Memory 通信,`tpop` 已将数据拷贝至本地内存,slot 可立即释放,因此 `tfree` 在 A2A3 上为空操作(EmitC 直接擦除)。

概念签名:

```mlir
pto.tfree(%pipe : !pto.pipe<SrcTileTy, DstTileTy>)
```

参数说明:

| 参数 | 类型 | 说明 | 约束 |
|---|---|---|---|
| `pipe` | `!pto.pipe<SrcTileTy, DstTileTy>` | `initialize_pipe` 返回的 pipe 句柄 | 必须与对应 `tpop` 使用同一 pipe |

约束与行为:

- 必须在 `section.cube` 或 `section.vector` 内部使用。
- 每个 `tpop` 应对应一个 `tfree`,使用相同的 `pipe_handle`。
- `InsertTFreePass`(仅 A5)会在 `tpop` 的 tile 数据最后一次被读取之后自动插入 `tfree`;已有手写 `tfree` 的 `tpop` 会被跳过。
- EmitC 降低:A5 生成 `TFREE(...)`,A2A3 擦除该 op。

## 3. TODO(当前版本)

### T1. FlagID 分配策略重构

- 当前(`0,2,4,6,8,10,12`)的线性分配策略较简单。
- 应该在 kernel 函数范围内进行分析和分配。
23 changes: 23 additions & 0 deletions include/PTO/IR/PTOAttrs.td
Original file line number Diff line number Diff line change
Expand Up @@ -437,4 +437,27 @@ def TileBufConfigAttr : AttrDef<PTO_Dialect, "TileBufConfig"> {
}];
}

//===----------------------------------------------------------------------===//
// Pipe Location (for TPUSH/TPOP unified pipe handle)
//===----------------------------------------------------------------------===//

def PTO_PipeLocation_GM : I32EnumAttrCase<"GM", 0, "gm">;
def PTO_PipeLocation_LOCAL : I32EnumAttrCase<"LOCAL", 1, "local">;

def PTO_PipeLocationEnum : PTO_I32Enum<
"PipeLocation", "PTO pipe physical location", [
PTO_PipeLocation_GM,
PTO_PipeLocation_LOCAL
]>;

def PTO_PipeLocationAttr : PTO_Attr<"PipeLocation", "pipe_location"> {
let parameters = (ins EnumParameter<PTO_PipeLocationEnum>:$location);
let assemblyFormat = "`<` params `>`";
let description = [{
Physical location of cross-core pipe buffer.
GM: pipe in global memory.
LOCAL: pipe in on-chip local memory (UB/L1).
}];
}

#endif // MLIR_DIALECT_PTO_IR_PTOATTRS
114 changes: 114 additions & 0 deletions include/PTO/IR/PTOOps.td
Original file line number Diff line number Diff line change
Expand Up @@ -3645,4 +3645,118 @@ def TPrintOp: PTO_TOp<"tprint", [
}];
}

//===----------------------------------------------------------------------===//
// TPUSH/TPOP Ring Buffer Communication Ops
//===----------------------------------------------------------------------===//

// --- Initialization ---

def InitializePipeOp : PTO_Op<"initialize_pipe", [
DeclareOpInterfaceMethods<MemoryEffectsOpInterface>
]> {
let summary = "Initialize ring buffer pipe handle";
let description = [{
Called once at kernel startup. Binds ring buffer pipe to backing memory,
computes slot configuration from dir_mask, and returns a pipe handle.
}];

let arguments = (ins
I8Attr:$dir_mask,
I32Attr:$slot_size,
PTODpsType:$gm_slot_buffer,
I32:$c2v_consumer_buf,
I32:$v2c_consumer_buf
);

let results = (outs PipeType:$pipe);
let hasVerifier = 1;

let assemblyFormat = [{
`{` `dir_mask` `=` $dir_mask `,` `slot_size` `=` $slot_size `}`
`(` $gm_slot_buffer `:` qualified(type($gm_slot_buffer)) `,`
$c2v_consumer_buf `:` type($c2v_consumer_buf) `,`
$v2c_consumer_buf `:` type($v2c_consumer_buf)
`)` attr-dict `->` qualified(type($pipe))
}];
}

// --- Data Transfer: Push (producer, no DPS) ---

def TPushOp : PTO_TOp<"tpush", [
OpPipeInterface,
DeclareOpInterfaceMethods<MemoryEffectsOpInterface>
]> {
let summary = "Push tile data via unified pipe handle";

let arguments = (ins
PTODpsType:$tile,
PipeType:$pipe_handle
);

let results = (outs);
let hasVerifier = 1;

let assemblyFormat = [{
`(` $tile `,` $pipe_handle `:` qualified(type($tile)) `,` qualified(type($pipe_handle)) `)`
attr-dict
}];

let extraClassDeclaration = [{
::mlir::pto::PIPE getPipe() { return ::mlir::pto::PIPE::PIPE_MTE1; }
}];
}

// --- Data Transfer: Pop (consumer, DPS) ---

def TPopOp : PTO_TOp<"tpop", [
PTO_DpsInitOpInterface,
OpPipeInterface,
DeclareOpInterfaceMethods<MemoryEffectsOpInterface>
]> {
let summary = "Pop tile data via unified pipe handle";

let arguments = (ins
PTODpsType:$tile,
PipeType:$pipe_handle
);

let results = (outs);
let hasVerifier = 1;

let assemblyFormat = [{
`(` $tile `,` $pipe_handle `:` qualified(type($tile)) `,` qualified(type($pipe_handle)) `)`
attr-dict
}];

let extraClassDeclaration = [{
::mlir::pto::PIPE getPipe() { return ::mlir::pto::PIPE::PIPE_MTE1; }
::mlir::MutableOperandRange getDpsInitsMutable() { return getTileMutable(); }
}];
}

// --- Data Transfer: Free (consumer slot release) ---

def TFreeOp : PTO_TOp<"tfree", [
OpPipeInterface,
DeclareOpInterfaceMethods<MemoryEffectsOpInterface>
]> {
let summary = "Release pipe slot after consumer finishes using data";

let arguments = (ins
PipeType:$pipe_handle
);

let results = (outs);
let hasVerifier = 1;

let assemblyFormat = [{
`(` $pipe_handle `:` qualified(type($pipe_handle)) `)`
attr-dict
}];

let extraClassDeclaration = [{
::mlir::pto::PIPE getPipe() { return ::mlir::pto::PIPE::PIPE_MTE1; }
}];
}

#endif // MLIR_DIALECT_PTO_IR_PTOOPS
12 changes: 12 additions & 0 deletions include/PTO/IR/PTOTypeDefs.td
Original file line number Diff line number Diff line change
Expand Up @@ -184,3 +184,15 @@ def TileBufType : TypeDef<PTO_Dialect, "TileBuf"> {
int32_t getPadValueI32() const; // 0 null, 1 zero, 2 max, 3 min
}];
}

def PipeType : TypeDef<PTO_Dialect, "Pipe"> {
let mnemonic = "pipe";
let summary = "Pipe handle type for TPUSH/TPOP unified static schedule";
let parameters = (ins
"mlir::Type":$srcTileType,
"mlir::Type":$dstTileType
);
let assemblyFormat = [{
`<` $srcTileType `,` $dstTileType `>`
}];
}
2 changes: 2 additions & 0 deletions include/PTO/Transforms/Passes.h
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,8 @@ std::unique_ptr<Pass> createPTOViewToMemrefPass();
std::unique_ptr<mlir::Pass> createPTOInsertLoadStoreForMixCVPass();
std::unique_ptr<Pass> createInferPTOLayoutPass();
// Declare register function
std::unique_ptr<Pass> createPTOInsertTFreePass();

void registerPTOPasses();

} // namespace pto
Expand Down
17 changes: 17 additions & 0 deletions include/PTO/Transforms/Passes.td
Original file line number Diff line number Diff line change
Expand Up @@ -116,4 +116,21 @@ def PTOLoweringSyncToPipe : Pass<"pto-lowering-sync-to-pipe", "func::FuncOp"> {
];
}

def PTOInsertTFree : Pass<"pto-insert-tfree", "func::FuncOp"> {
let summary = "Auto-insert pto.tfree after last use of tpop tile data (A5 only)";
let description = [{
For each pto.tpop in section.cube / section.vector, analyzes the
data dependency of the popped tile and inserts pto.tfree(pipe_handle)
at the earliest safe point — immediately after the last read of the tile.
Skips tpop ops that already have a matching tfree.
Only meaningful on A5 architecture where tpop/tfree split protocol is used.
}];

let constructor = "mlir::pto::createPTOInsertTFreePass()";

let dependentDialects = [
"mlir::pto::PTODialect"
];
}

#endif // MLIR_DIALECT_PTO_PASSES
Loading