PaddlePaddle / flash-attention Public

forked from Dao-AILab/flash-attention

Notifications You must be signed in to change notification settings
Fork 35
Star 20

Code
Pull requests 25
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Pull requests
Actions
Projects
Security and quality
Insights

Pull requests: PaddlePaddle/flash-attention

Labels 9 Milestones 0

New pull request New

25 Open 128 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

support csrc/flash_attn_with_bias_and_mask/src/fmha/smem_tile.h cuda132 build

#153 opened May 25, 2026 by gouzil Member

Loading…

[Sm100] FlashMask V4 fwd support head dim 256 via Split-D (q_stage=1, d==dv)

#152 opened May 22, 2026 by wangxudong10

Loading…

[WIP] Fa4 d256 varlen zero seq

#149 opened May 19, 2026 by umiswing Member • Draft

[Sm100] FlashMask V4 fwd support headdim 256

#148 opened May 19, 2026 by baoqiwen • Draft

[WIP] support cp case use_varlen

#147 opened May 18, 2026 by umiswing Member • Draft

Fix CUDA 13.2 flash attention build compatibility

#141 opened Apr 28, 2026 by gouzil Member

Loading…

feat: implement max_logits support for flashmask

#139 opened Apr 22, 2026 by xxyux

Loading…

[Feat] CP-balance formal incorporation as flash_mask sub-module

#127 opened Apr 9, 2026 by Enigmatisms

Loading…

Add rrattn estimate func and interface

#117 opened Mar 13, 2026 by LLSGYN

Loading…

Support Global Sliding Window (num_vec == 4) on FM4 BWD

#111 opened Mar 3, 2026 by umiswing Member

Loading…

adapt to torch version flashmaskv4

#103 opened Jan 28, 2026 by clouds1238

Loading…

add_flashmask_cpbalance

#99 opened Dec 30, 2025 by starcrown001

Loading…

add flashmask v2 torch flash_api.cpp flashmask_interface.py setup.py

#98 opened Dec 23, 2025 by clouds1238

Loading…

fine-tuned tile size & regitser for fwd_hdim64

#92 opened Nov 14, 2025 by xxyux

Loading…

Removed redundant templates and related compile-time/runtime code

#91 opened Nov 14, 2025 by Enigmatisms

Loading…

1 task

fix fa2 flashmask oob read

#67 opened Jun 26, 2025 by umiswing Member • Draft

[WIP] fa3 varlen fix int32 overflow

#65 opened Jun 19, 2025 by umiswing Member

Loading…

scan from right to left and skip masked block for each row at kernel begin

#55 opened Sep 23, 2024 by GuoxiaWang Collaborator

Loading…

optimize skip block calculate in bwd

#49 opened Aug 28, 2024 by GuoxiaWang Collaborator

Loading…

[BugFix] fix_mask error using unpadding api

#41 opened Apr 23, 2024 by wwbitejotunn

Loading…

Fix unpadding input with padding mask compute error

#38 opened Apr 15, 2024 by wwbitejotunn

Loading…

Fa cmake extends op

#31 opened Dec 14, 2023 by AnnaTrainingG

Loading…

Fa cmake

#29 opened Dec 6, 2023 by AnnaTrainingG

Loading…

[WIP]Sparse seqparallel

#9 opened Jun 8, 2023 by zkh2016

Loading…

add block sparse api

#7 opened May 27, 2023 by kuizhiqing Member

Loading…

ProTip! Adding no:label will show everything without a label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!