Architecture of ALBEF

Hello I would like to do some experiments using ALBEF model. For this I reviewed your paper as well, but I am unable to understand why first six layers of bert base was used as text encoder and why last six layers are used as multimodal encoder? Why didn't the entire BERT_base with all 12 layers was used as text encoder and multimodal encoder? Your help in this regard would be greatly appreciated. @LiJunnan1992 @svc-scm @chenxwh 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture of ALBEF #144

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Architecture of ALBEF #144

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions