Skip to content

Evaluate SageAttention3 as FlexAttention alternative for Blackwell #8

@2imi9

Description

@2imi9

Context

SageAttention3 has explicit Blackwell variant. Claims 2-5x speedup over FlashAttention with quantized attention (FP8/INT8).

Action Items

  • Check sliding window + GQA support
  • Install and benchmark on RTX 5090
  • Compare val_bpb, tok/sec, VRAM vs FlexAttention

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions