Skip to content

Noise on LIL layer due to batching #9

@akshaylive

Description

@akshaylive

The codebase uses batching to process multiple sentences at the same time. Each sentence can be broken down into multiple phrases, represented by non-terminal onehot vectors. Because not ALL sentences in a batch contain the same number of phrase decomposition, some sentences have empty phrases. Ideally, lil_logits should MASK out all such representations and these should not account for the total loss. If not, some the lil_logits_mean will be dominated by 0 - pooled_seq_rep.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions