[Documentation]: FP8 attention functional but documentation wrong/incomplete

### Description of errors

There's an FP8 variant of attention that lives under /flash_attn/flash_attn_triton_amd. The README there says to import it like this:
`from flash_attn import flash_attn_qkvpacked_fp8_func`
But doing that causes an error. However, if we do this:
`from flash_attn.flash_attn_triton_amd.fp8 import flash_attn_qkvpacked_fp8_func`
it imports successfully and works without issue, with no impact on loss and a decent speedup over the normal variant.
Also, the main README doesn't mention it can be used.

### Attach any links, screenshots, or additional evidence you think will be helpful.

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Documentation]: FP8 attention functional but documentation wrong/incomplete #153

Description of errors

Attach any links, screenshots, or additional evidence you think will be helpful.

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Documentation]: FP8 attention functional but documentation wrong/incomplete #153

Description

Description of errors

Attach any links, screenshots, or additional evidence you think will be helpful.

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions