You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 19, 2026. It is now read-only.
But should we also avoid masking all those ['PAD'] at the end of a sentence (if the sentence is shorter than max_seq_length and if there is no second sentence segment)?
I understand ['PAD'] itself has token_id = 0, but I do not see this being used to prevent masking in downstream steps. If we do not ignore it, this will affect the probability calculation here
Here is a set of tokens that should not be masked during dynamic masking.
electra/pretrain/pretrain_helpers.py
Line 121 in 7911132
But should we also avoid masking all those
['PAD']at the end of a sentence (if the sentence is shorter thanmax_seq_lengthand if there is no second sentence segment)?I understand
['PAD']itself has token_id = 0, but I do not see this being used to prevent masking in downstream steps. If we do not ignore it, this will affect the probability calculation hereelectra/pretrain/pretrain_helpers.py
Lines 167 to 170 in 7911132
Also, we will be trying to predict 'PAD' that is outside a sequence, which is a bit unintuitive.
Maybe I am missing something here. Thanks again for putting up such a great work!