Skip to content

Problem shape '[1029, 320, 32]' is invalid for input of size 1317120 #7

@KOLE-LE

Description

@KOLE-LE

When training with the avss dataset, the audio_fea extracted by vggish is bs * 10 in the first dimension, which will not match the subsequent feature matrix with bs in the first dimension. The specific problem appears in "out2 = self.cross_attn (query, src, src, key_padding_mask = padding_mask) [0]",it showing this error:
File "/home/ptr/hzw/AVSegFormer-master/model/AVSegFormer.py", line 75, in forward
pred, mask_feature = self.head(img_feat, audio_feat)
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/hzw/AVSegFormer-master/model/head/AVSegHead.py", line 223, in forward
memory, outputs = self.transformer(query, src_flatten, spatial_shapes,
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 160, in forward
outputs = self.decoder(query, memory, reference_points,
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 139, in forward
out = layer(out, src, reference_points, spatial_shapes,
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 117, in forward
out2 = self.cross_attn(
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1003, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/functional.py", line 5044, in multi_head_attention_forward
k = k.contiguous().view(k.shape[0], bsz * num_heads, head_dim).transpose(0, 1)
RuntimeError: shape '[1029, 320, 32]' is invalid for input of size 1317120

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions