Hi,
Thanks for sharing your code. Have you sampled all the videos of the charades dataset to have 1024 frames before loading? This procedure may take a lot of memory. Is'nt it possible to upsample the resulted feature maps of the original 25fps sampling videos on the provided pretrained I3D to have 128,7,7,1024 instead of e.g. 45,7,7,1024? Would it affect the performance of timeception afterwards?
Hi,
Thanks for sharing your code. Have you sampled all the videos of the charades dataset to have 1024 frames before loading? This procedure may take a lot of memory. Is'nt it possible to upsample the resulted feature maps of the original 25fps sampling videos on the provided pretrained I3D to have 128,7,7,1024 instead of e.g. 45,7,7,1024? Would it affect the performance of timeception afterwards?