Hi, thank you for releasing the EgoLife_EyeTracking_EyeGaze dataset.
I am currently trying to align the provided EyeGaze CSV files with the EgoLife egocentric videos and project the gaze signal onto the RGB video frames.
From the released files, I can see that the dataset contains:
- EyeGaze CSV files with fields such as tracking_timestamp_us, left_yaw_rads_cpf, right_yaw_rads_cpf, pitch_rads_cpf, depth_m, etc.
- EyeTracking MP4 files.
However, for strict Project Aria gaze projection, it seems that the following data are also needed:
- Original VRS files, or at least the corresponding Aria device calibration
- RGB camera intrinsics / extrinsics
- The transform between CPF / Central Pupil Frame and the RGB camera frame
- Possibly the original MPS output directory or calibration files
Without these files, I can only do an approximate projection by matching clips by filename, aligning timestamps within each clip, and mapping yaw / pitch to image coordinates using an assumed FOV. This is useful for visualization, but it is not a geometrically accurate Aria projection.
Could you please clarify:
- Are the original VRS files available for this dataset?
- Are the Aria camera calibration files available?
- Is there any official mapping from the provided EyeGaze CSV files to the EgoLife RGB video frames?
- Are the EyeTracking MP4 files already gaze-overlay videos, or are they separate eye-tracking recordings?
- If VRS / calibration files are not released, what projection method do you recommend for using the gaze CSVs with the EgoLife videos?
Thank you very much!
Hi, thank you for releasing the EgoLife_EyeTracking_EyeGaze dataset.
I am currently trying to align the provided EyeGaze CSV files with the EgoLife egocentric videos and project the gaze signal onto the RGB video frames.
From the released files, I can see that the dataset contains:
However, for strict Project Aria gaze projection, it seems that the following data are also needed:
Without these files, I can only do an approximate projection by matching clips by filename, aligning timestamps within each clip, and mapping yaw / pitch to image coordinates using an assumed FOV. This is useful for visualization, but it is not a geometrically accurate Aria projection.
Could you please clarify:
Thank you very much!