Skip to content

Conversation

@fbarchard
Copy link
Collaborator

Detect AMD Family 26 with any model as cpuinfo_uarch_zen5

@digantdesai
Copy link
Contributor

From @fbarchard

The reason for wanting zen5 is avx512 on AMD and Intel behave differently
AMD prefers short/wide tile sizes such as 6x64
Intel prefers tall/narrow tile sizes such as 13x32
Also GFNI is present on both, but GFNI is slow on AMD Zen4
And int32 multiply is very slow on Intel but fast on AMD.

@digantdesai
Copy link
Contributor

riscv failure seems unrelated.

@digantdesai digantdesai merged commit cebb093 into pytorch:main Nov 14, 2024
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants