Skip to content

Rationale Behind Removing CLS Token #12

@jaeyoung-l

Description

@jaeyoung-l

Hello,

I have a question regarding the embedding modification inside MultiModal2 module, right after getting the output of the CLIPModel.

It seems when image evidences exist, the cls token of the image embedding is removed
(https://github.com/VT-NLP/Mocheg/blob/main/verification/model.py#L160)

.whereas when no text evidences given that of the text embedding is removed.
(https://github.com/VT-NLP/Mocheg/blob/main/verification/model.py#L167)

My questions are,

  1. What is the reason of removing the CLS token?
  2. When removing CLS token, why is only one of cls tokens removed when both the image/text evidences exist?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions