Should dynamic masking also ignore ['PAD']

Here is a set of tokens that should not be masked during dynamic masking.
https://github.com/google-research/electra/blob/79111328070e491b287c307906701ebc61091eb2/pretrain/pretrain_helpers.py#L121

But should we also avoid masking all those `['PAD']` at the end of a sentence (if the sentence is shorter than `max_seq_length` and if there is no second sentence segment)?  

I understand `['PAD']` itself has token_id = 0, but I do not see this being used to prevent masking in downstream steps. If we do not ignore it, this will affect the probability calculation here
https://github.com/google-research/electra/blob/79111328070e491b287c307906701ebc61091eb2/pretrain/pretrain_helpers.py#L167-L170

Also, we will be trying to predict 'PAD' that is outside a sequence, which is a bit unintuitive. 

Maybe I am missing something here. Thanks again for putting up such a great work! 


	# Get a probability of masking each position in the sequence
	candidate_mask_float = tf.cast(candidates_mask, tf.float32)
	sample_prob = (proposal_distribution * candidate_mask_float)
	sample_prob /= tf.reduce_sum(sample_prob, axis=-1, keepdims=True)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should dynamic masking also ignore ['PAD'] #59

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Should dynamic masking also ignore ['PAD'] #59

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions