Hi, thanks for the sharing! I have a question about the adapter location in W2V2.
W2V2 transformer encoder applies LN after attention. But after adding adapter, should the adapter computation be conducted after LN layers instead of before it?
|
hidden_states = self.dropout(hidden_states) |
|
hidden_states = attn_residual + hidden_states |
|
|
|
# adapter |
|
if args.adapter: adapt_h = self.adapter(hidden_states) |
|
|
|
hidden_states = self.layer_norm(hidden_states) |
|
hidden_states = hidden_states + self.feed_forward(hidden_states) |
|
if args.adapter: hidden_states = hidden_states+ adapt_h |
|
|
|
hidden_states = self.final_layer_norm(hidden_states) |
Hi, thanks for the sharing! I have a question about the adapter location in W2V2.
W2V2 transformer encoder applies LN after attention. But after adding adapter, should the adapter computation be conducted after LN layers instead of before it?
IPET/VoxCeleb1/W2V2/models/W2V2.py
Lines 547 to 557 in 2e4b0e3