We should give this optimized einsum for cuda a shot: https://pypi.org/project/opt-einsum-torch/
It is very likely that users come up with large matrices, and we might gain some performance improvements or maybe without something like this we might even run into memory issues.
We should give this optimized einsum for cuda a shot: https://pypi.org/project/opt-einsum-torch/
It is very likely that users come up with large matrices, and we might gain some performance improvements or maybe without something like this we might even run into memory issues.