## Summary Replaced SDPA fallback (no sliding window) with FlexAttention (PyTorch 2.5+). Supports sliding window + GQA natively. ## Results | Backend | Sliding Window | val_bpb | tok/sec | |---|---|---|---| | SDPA (old) | No | 1.739 | ~70k | | **FlexAttention** | **Yes** | **1.680** | **~83k** | Commit: 8c7bed4