You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A Pull Request (PR) that completes performance evaluation of Transpose and Batch Matmul, including measuring the percentage of total execution time spent in Transpose.
A Pull Request (PR) that implements vectorization and parallel optimization for linalg.batch_matmul_transpose_b.
Under examples/BuddyNext, write a test case that includes Transpose and Matmul into a function. Measure and report what percentage of the total kernel execution time is taken by the Transpose operation (do not forget to use -batchmatmul-optimize pass).
Under examples/BuddyNext, hand-write an MLIR example for linalg.batch_matmul_transpose_b with vectorization and parallelization enabled, and evaluate its performance.
Deliverables
linalg.batch_matmul_transpose_b.Task Description
Run the full end-to-end compilation workflow of DeepSeek R1.
Identify representative Transpose + Matmul cases (
build/examples/BuddyDeepSeekR1/subgraph0_decode.mlir), for example:Under
examples/BuddyNext, write a test case that includes Transpose and Matmul into a function. Measure and report what percentage of the total kernel execution time is taken by the Transpose operation (do not forget to use-batchmatmul-optimizepass).Under
examples/BuddyNext, hand-write an MLIR example forlinalg.batch_matmul_transpose_bwith vectorization and parallelization enabled, and evaluate its performance.Timeline