[Examples] add tfla op and tfla-optimized by Old-cpu · Pull Request #722 · buddy-compiler/buddy-mlir

Old-cpu · 2026-03-17T11:49:21Z

Summary

Description

This PR adds an optimized implementation of the Tiled Flash Linear Attention (TFLA) kernel based on the NeurIPS 2025 paper "Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels",demonstrating significant performance improvements over the baseline versions. The optimized TFLA kernel achieves approximately 21× speedup compared to the baseline TFLA implementation, and approximately 3× speedup compared to the fused GQA Attention kernel (next-gqa-attention-fusion.mlir).

Hardware Configuration

CPU: Intel Xeon Silver 4114 @ 2.20GHz
Cores: 2 sockets × 10 cores = 20 physical cores (40 logical threads)
Architecture: x86_64 with AVX-512 support
OpenMP threads: 48

Background

The TFLA kernel provides a fair comparison with Grouped Query Attention (GQA) under identical configurations:

Batch size: 1
Query heads: 12
KV groups: 2 (each group serves 6 heads via GQA)
Sequence length: 1 (single-token inference)
Hidden dimension per head: 128
KV cache sequence length: 1024 per groupt.

Expected Performance

Describe how reviewers can verify and test this change. Include command-line instructions if applicable.
make next-tfla-run and make next-tfla-optimized-run

Checklist

The code builds successfully
Existing tests pass
New tests are added where appropriate
Code follows the project coding style
Documentation is updated if needed

[Examples] add tfla op and tfla-optimized

3087b7f

Old-cpu requested a review from zhanghb97 as a code owner March 17, 2026 11:49

fix test failure issues

8f53077

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Examples] add tfla op and tfla-optimized#722

[Examples] add tfla op and tfla-optimized#722
Old-cpu wants to merge 2 commits intobuddy-compiler:mainfrom
Old-cpu:tiled-flash-linea-attention-new

Old-cpu commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Old-cpu commented Mar 17, 2026

Summary

Description

Hardware Configuration

Background

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant