Problem Description
Enabling BIAS within the persistent and StreamK kernel calls in matmul.py causes Triton compilation errors. This appears to be due to the slice ordering for the BIAS, which is usually a 1-D torch tensor.
In the persistent kernel, this is fixed within the binary.py file's add_vector() call by changing bias_vector[:, None] to bias_vector[None, :].
The StreamK kernel's fix will likely be more complex.
Operating System
Ubuntu 22.04.3 LTS (Jammy Jellyfish)
CPU
AMD EPYC 9575F 64-Core Processor
GPU
Multi-GPU AMD Instinct MI355X
ROCm Version
ROCm 7.0.0
ROCm Component
No response
Steps to Reproduce
To be provided.
The code currently does not enable BIAS and instead this must be done in a slightly hacky manner. I will provide a branch pointer soon.
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response