The codebase uses batching to process multiple sentences at the same time. Each sentence can be broken down into multiple phrases, represented by non-terminal onehot vectors. Because not ALL sentences in a batch contain the same number of phrase decomposition, some sentences have empty phrases. Ideally, lil_logits should MASK out all such representations and these should not account for the total loss. If not, some the lil_logits_mean will be dominated by 0 - pooled_seq_rep.
The codebase uses batching to process multiple sentences at the same time. Each sentence can be broken down into multiple phrases, represented by non-terminal onehot vectors. Because not ALL sentences in a batch contain the same number of phrase decomposition, some sentences have empty phrases. Ideally,
lil_logitsshould MASK out all such representations and these should not account for the total loss. If not, some thelil_logits_meanwill be dominated by0 - pooled_seq_rep.