Add `FLASH_ATTENTION_FWD_TRITON_AMD_CONFIG_JSON` env var support by alexheretic · Pull Request #2000 · ROCm/aiter

alexheretic · 2026-02-07T17:36:07Z

Merged in flash-attention repo: Dao-AILab/flash-attention#2239

Allows users to override triton attn_fwd config when not autotuning.

Motivation

With autotuning disabled the current default configs are sub-optimal, at least for my gfx1100 but probably for other cards too.

I get roughly 2x faster wan2.2 workflow performance using config FLASH_ATTENTION_FWD_TRITON_AMD_CONFIG_JSON='{"BLOCK_M":128,"BLOCK_N":64,"waves_per_eu":1,"PRE_LOAD_V":false,"num_stages":1,"num_warps":8}'

Autotuning itself is very slow to run (e.g. takes 82 minutes) and needs to re-run on any param change (it also isn't persisting maybe?). So it is helpful to be able to override the !autotune config.

Technical Details

Test Plan

Manually tested Dao-AILab/flash-attention#2239

Test Result

✅

Allows users to override triton attn_fwd config when not autotuning.

Add FLASH_ATTENTION_FWD_TRITON_AMD_CONFIG_JSON env var support

92536fa

Allows users to override triton attn_fwd config when not autotuning.

alexheretic requested a review from a team February 7, 2026 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `FLASH_ATTENTION_FWD_TRITON_AMD_CONFIG_JSON` env var support#2000

Add `FLASH_ATTENTION_FWD_TRITON_AMD_CONFIG_JSON` env var support#2000
alexheretic wants to merge 1 commit intoROCm:mainfrom
alexheretic:fwd-config-json

alexheretic commented Feb 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alexheretic commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alexheretic commented Feb 7, 2026 •

edited

Loading