-
Notifications
You must be signed in to change notification settings - Fork 10
Description
When training with the avss dataset, the audio_fea extracted by vggish is bs * 10 in the first dimension, which will not match the subsequent feature matrix with bs in the first dimension. The specific problem appears in "out2 = self.cross_attn (query, src, src, key_padding_mask = padding_mask) [0]",it showing this error:
File "/home/ptr/hzw/AVSegFormer-master/model/AVSegFormer.py", line 75, in forward
pred, mask_feature = self.head(img_feat, audio_feat)
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/hzw/AVSegFormer-master/model/head/AVSegHead.py", line 223, in forward
memory, outputs = self.transformer(query, src_flatten, spatial_shapes,
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 160, in forward
outputs = self.decoder(query, memory, reference_points,
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 139, in forward
out = layer(out, src, reference_points, spatial_shapes,
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 117, in forward
out2 = self.cross_attn(
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1003, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/functional.py", line 5044, in multi_head_attention_forward
k = k.contiguous().view(k.shape[0], bsz * num_heads, head_dim).transpose(0, 1)
RuntimeError: shape '[1029, 320, 32]' is invalid for input of size 1317120