I have a question about frame alignment for camera motion annotation.
To reduce annotation costs, we usually use sparse labels: for a 100-frame video, only 25 frames have camera motion annotations.
During training, do you
- downsample the video or
- interpolate the sparse annotations (which seems more reasonable for preserving temporal resolution)
If using interpolation, what interpolation method is applied?
Thanks!
I have a question about frame alignment for camera motion annotation.
To reduce annotation costs, we usually use sparse labels: for a 100-frame video, only 25 frames have camera motion annotations.
During training, do you
If using interpolation, what interpolation method is applied?
Thanks!