-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Found while working on #648
The shader outputs every threads values for some compute semantics like DispatchThreadID, GroupID, GroupThreadID and GroupIndex.
The values are written to a large buffer with a unique key as index computed from the GroupID and the GroupThreadId. Reading https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/sv-groupid this should allow me to get a stable index no matter the order of scheduling: the index depends on the XYZ position, not the order.
The result is stable across DX and VK, but for MSL, the results are slightly off:
https://github.com/llvm/offload-test-suite/actions/runs/21940854135/job/63365919110?pr=648
In this run, the value 0x210008 is reported by Metal, while the expected value is 0x210002.
0x21 is the unique key computed in the shader.
Key is computed as follows:
FID = thread index in the flat layout
KEY = FID * 12 + N, with N being a value in [0; 12[
(each thread outputs 12 values).
So we can compute back the thread index from this key:
- raw key: 0x21
0x21 == 2 * 12 + 9- this is thread 2 in the flat representation.
dispatch: 2x2x2
numthreads: 2x2x2.
This means we have 4 threads per line, 16 threads per plane.
This means the global thread coordinate here is: x=2, y=0, z=0
Which should be: GroupID: x=1, y=0, z=0, GroupThreadID: x=0, y=0, z=0, GroupIndex: 2, DispatchThreadID: x=2, y=0, z=0
The expected value in the test file is 0x210002, which means we expect DispatchThreadID.x to be 2. But on Metal, this value is 8. Which is I think impossible given dispatch.x is 2 and numthreads.x is 2.