Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
ca26dcf
add round multiple
micmelesse Jun 5, 2025
3c3774a
fix fwd
micmelesse Jun 5, 2025
6a1f7c1
backward fix
micmelesse Jun 5, 2025
91f1d54
use rounded lse flag
micmelesse Jun 5, 2025
b60bcc3
passing ROUNDED_LSE
micmelesse Jun 5, 2025
af017fa
default is new rounded mode
micmelesse Jun 20, 2025
b863404
rename to fused_atmoics and fused_no_atomics
micmelesse Jun 20, 2025
728fa12
add test for torch_compile
micmelesse Jun 20, 2025
89280ea
add varlen torch compile test
micmelesse Jun 20, 2025
476d3c2
add old one kernel for ref
micmelesse Jun 20, 2025
3c9021e
fix varlen mismatch bug
micmelesse Jun 20, 2025
6db5170
fix shape issue in varlen but mismatch
micmelesse Jun 20, 2025
1489e49
sync torch compile kernel launch
micmelesse Jun 23, 2025
f713ea6
simple varlen test
micmelesse Jun 23, 2025
551a1ab
add debug code
micmelesse Jun 23, 2025
a82dce3
rm old
micmelesse Jun 23, 2025
7856d1b
ignore old impls
micmelesse Jun 23, 2025
7eda935
DEBUG flag works in interface only
micmelesse Jun 23, 2025
e372a41
ref uses the righ shape for lse
micmelesse Jun 23, 2025
7c8488a
rm oldest bwd kernel
micmelesse Jun 23, 2025
d8be62c
fix typo
micmelesse Jun 23, 2025
abf3efc
fix varlen bug
micmelesse Jun 23, 2025
c6b9cb4
fix bug. Get info from q for now
micmelesse Jun 24, 2025
60bc0db
simple shape and stride checkout
micmelesse Jun 24, 2025
3a09a00
add more tests
micmelesse Jun 24, 2025
0c056da
test kvcache
micmelesse Jun 24, 2025
e4327e2
kvcache safe
micmelesse Jun 24, 2025
39fd514
match case
micmelesse Jun 25, 2025
1fcf81e
fix segfault due to bad return_softmax
micmelesse Jun 25, 2025
bfffe91
run bench
micmelesse Jun 26, 2025
b772ef9
run seperate for the main functions
micmelesse Jun 26, 2025
2745528
just output benchmark
micmelesse Jun 26, 2025
e2f8775
default csv format and time stamp files
micmelesse Jun 26, 2025
d8e5ac4
non verbsoe bench
micmelesse Jun 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 0 additions & 105 deletions .github/workflows/amd_nightly.yml

This file was deleted.

4 changes: 3 additions & 1 deletion .github/workflows/amd_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,4 +60,6 @@ jobs:

- name: AMD Bench
run: |
python flash_attn/flash_attn_triton_amd/bench.py -benchmark_fn flash_attn_func flash_attn_varlen_func flash_attn_with_kvcache
python flash_attn/flash_attn_triton_amd/bench.py -benchmark_fn flash_attn_func
python flash_attn/flash_attn_triton_amd/bench.py -benchmark_fn flash_attn_varlen_func
python flash_attn/flash_attn_triton_amd/bench.py -benchmark_fn flash_attn_with_kvcache
2 changes: 2 additions & 0 deletions flash_attn/flash_attn_triton_amd/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
bwd_prefill_fused.py
bwd_prefill_onekernel.py
Loading
Loading