Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
linear-algebra mpi cuda scalapack matrix-multiplication gpu-acceleration rocm matmul communication-optimal pdgemm
-
Updated
Feb 4, 2026 - C++