Hi, May I know whether I can use sima instead of multi head attention in decoder, to reduce complexity? Thanks!