Hi, I have a quick question about Phase 1 (Student Initialization).
Instead of using 1,000 ODE pairs generated by the bidirectional teacher, would it work to initialize the student by simply using real video data (adding noise to GT videos and regressing to $x_0$)?
Hi, I have a quick question about Phase 1 (Student Initialization).
Instead of using 1,000 ODE pairs generated by the bidirectional teacher, would it work to initialize the student by simply using real video data (adding noise to GT videos and regressing to$x_0$ )?