Test and Integrate Optimized Einsum

We should give this optimized einsum for cuda a shot: https://pypi.org/project/opt-einsum-torch/

It is very likely that users come up with large matrices, and we might gain some performance improvements or maybe without something like this we might even run into memory issues.