Description of errors
There's an FP8 variant of attention that lives under /flash_attn/flash_attn_triton_amd. The README there says to import it like this:
from flash_attn import flash_attn_qkvpacked_fp8_func
But doing that causes an error. However, if we do this:
from flash_attn.flash_attn_triton_amd.fp8 import flash_attn_qkvpacked_fp8_func
it imports successfully and works without issue, with no impact on loss and a decent speedup over the normal variant.
Also, the main README doesn't mention it can be used.
Attach any links, screenshots, or additional evidence you think will be helpful.
No response
Description of errors
There's an FP8 variant of attention that lives under /flash_attn/flash_attn_triton_amd. The README there says to import it like this:
from flash_attn import flash_attn_qkvpacked_fp8_funcBut doing that causes an error. However, if we do this:
from flash_attn.flash_attn_triton_amd.fp8 import flash_attn_qkvpacked_fp8_funcit imports successfully and works without issue, with no impact on loss and a decent speedup over the normal variant.
Also, the main README doesn't mention it can be used.
Attach any links, screenshots, or additional evidence you think will be helpful.
No response