Skip to content

[Documentation]: FP8 attention functional but documentation wrong/incomplete #153

@ZDisket

Description

@ZDisket

Description of errors

There's an FP8 variant of attention that lives under /flash_attn/flash_attn_triton_amd. The README there says to import it like this:
from flash_attn import flash_attn_qkvpacked_fp8_func
But doing that causes an error. However, if we do this:
from flash_attn.flash_attn_triton_amd.fp8 import flash_attn_qkvpacked_fp8_func
it imports successfully and works without issue, with no impact on loss and a decent speedup over the normal variant.
Also, the main README doesn't mention it can be used.

Attach any links, screenshots, or additional evidence you think will be helpful.

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions