-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
Hello,
I have a question regarding the embedding modification inside MultiModal2 module, right after getting the output of the CLIPModel.
It seems when image evidences exist, the cls token of the image embedding is removed
(https://github.com/VT-NLP/Mocheg/blob/main/verification/model.py#L160)
.whereas when no text evidences given that of the text embedding is removed.
(https://github.com/VT-NLP/Mocheg/blob/main/verification/model.py#L167)
My questions are,
- What is the reason of removing the CLS token?
- When removing CLS token, why is only one of cls tokens removed when both the image/text evidences exist?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels