Skip to content
This repository was archived by the owner on Apr 19, 2026. It is now read-only.
This repository was archived by the owner on Apr 19, 2026. It is now read-only.

Should dynamic masking also ignore ['PAD'] #59

@ccchang0111

Description

@ccchang0111

Here is a set of tokens that should not be masked during dynamic masking.

ignore_ids = [vocab["[SEP]"], vocab["[CLS]"], vocab["[MASK]"]]

But should we also avoid masking all those ['PAD'] at the end of a sentence (if the sentence is shorter than max_seq_length and if there is no second sentence segment)?

I understand ['PAD'] itself has token_id = 0, but I do not see this being used to prevent masking in downstream steps. If we do not ignore it, this will affect the probability calculation here

# Get a probability of masking each position in the sequence
candidate_mask_float = tf.cast(candidates_mask, tf.float32)
sample_prob = (proposal_distribution * candidate_mask_float)
sample_prob /= tf.reduce_sum(sample_prob, axis=-1, keepdims=True)

Also, we will be trying to predict 'PAD' that is outside a sequence, which is a bit unintuitive.

Maybe I am missing something here. Thanks again for putting up such a great work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions