Problem shape '[1029, 320, 32]' is invalid for input of size 1317120

When training with the avss dataset, the audio_fea extracted by vggish is bs * 10 in the first dimension, which will not match the subsequent feature matrix with bs in the first dimension. The specific problem appears in "out2 = self.cross_attn (query, src, src, key_padding_mask = padding_mask) [0]",it showing this error:
File "/home/ptr/hzw/AVSegFormer-master/model/AVSegFormer.py", line 75, in forward
    pred, mask_feature = self.head(img_feat, audio_feat)
  File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ptr/hzw/AVSegFormer-master/model/head/AVSegHead.py", line 223, in forward
    memory, outputs = self.transformer(query, src_flatten, spatial_shapes,
  File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 160, in forward
    outputs = self.decoder(query, memory, reference_points,
  File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 139, in forward
    out = layer(out, src, reference_points, spatial_shapes,
  File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 117, in forward
    out2 = self.cross_attn(
  File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1003, in forward
    attn_output, attn_output_weights = F.multi_head_attention_forward(
  File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/functional.py", line 5044, in multi_head_attention_forward
    k = k.contiguous().view(k.shape[0], bsz * num_heads, head_dim).transpose(0, 1)
RuntimeError: shape '[1029, 320, 32]' is invalid for input of size 1317120

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem shape '[1029, 320, 32]' is invalid for input of size 1317120 #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problem shape '[1029, 320, 32]' is invalid for input of size 1317120 #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions