Rationale Behind Removing CLS Token

Hello, 

I have a question regarding the embedding modification inside `MultiModal2` module, right after getting the output of the `CLIPModel`. 

It seems when image evidences exist, the cls token of the image embedding is removed
(https://github.com/VT-NLP/Mocheg/blob/main/verification/model.py#L160)

.whereas when no text evidences given that of the text embedding is removed.
(https://github.com/VT-NLP/Mocheg/blob/main/verification/model.py#L167)

My questions are,
1. What is the reason of removing the CLS token?
2. When removing CLS token, why is only one of cls tokens removed when both the image/text evidences exist?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rationale Behind Removing CLS Token #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rationale Behind Removing CLS Token #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions