I am following your framework and extend your work by adding "attention". However, when I review your code, I am confuse that whether you calculated the first principle component with training data or not? If you calculate the first principle component with current dataset (test data), it seems that you introduce some information to create the sentence embeddings for test datasets. In that case, your result may not be accepted.