Skip to content

Add FLASH_ATTENTION_FWD_TRITON_AMD_CONFIG_JSON env var support#2000

Open
alexheretic wants to merge 1 commit intoROCm:mainfrom
alexheretic:fwd-config-json
Open

Add FLASH_ATTENTION_FWD_TRITON_AMD_CONFIG_JSON env var support#2000
alexheretic wants to merge 1 commit intoROCm:mainfrom
alexheretic:fwd-config-json

Conversation

@alexheretic
Copy link

@alexheretic alexheretic commented Feb 7, 2026

Merged in flash-attention repo: Dao-AILab/flash-attention#2239

Allows users to override triton attn_fwd config when not autotuning.

Motivation

With autotuning disabled the current default configs are sub-optimal, at least for my gfx1100 but probably for other cards too.

I get roughly 2x faster wan2.2 workflow performance using config FLASH_ATTENTION_FWD_TRITON_AMD_CONFIG_JSON='{"BLOCK_M":128,"BLOCK_N":64,"waves_per_eu":1,"PRE_LOAD_V":false,"num_stages":1,"num_warps":8}'

Autotuning itself is very slow to run (e.g. takes 82 minutes) and needs to re-run on any param change (it also isn't persisting maybe?). So it is helpful to be able to override the !autotune config.

Technical Details

Test Plan

Manually tested Dao-AILab/flash-attention#2239

Test Result

Allows users to override triton attn_fwd config when not autotuning.
@alexheretic alexheretic requested a review from a team February 7, 2026 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant