W2V2 LayerNorm location

Hi, thanks for the sharing! I have a question about the adapter location in W2V2. 

W2V2 transformer encoder applies LN after attention. But after adding adapter, should the adapter computation be conducted after LN layers instead of before it?   

https://github.com/wngh1187/IPET/blob/2e4b0e3e23bd3768c963bb092aabeac58bfe7894/VoxCeleb1/W2V2/models/W2V2.py#L547-L557

	hidden_states = self.dropout(hidden_states)
	hidden_states = attn_residual + hidden_states

	# adapter
	if args.adapter: adapt_h = self.adapter(hidden_states)

	hidden_states = self.layer_norm(hidden_states)
	hidden_states = hidden_states + self.feed_forward(hidden_states)
	if args.adapter: hidden_states = hidden_states+ adapt_h

	hidden_states = self.final_layer_norm(hidden_states)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

W2V2 LayerNorm location #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

W2V2 LayerNorm location #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions