Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions Log_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# HDTF Download Script Update Log

## 2025-06-22

### Migration from youtube-dl to yt-dlp

#### Background
- Multiple errors occurred in the `download.py` script that was using `youtube-dl`
- Due to changes in YouTube's specifications, `youtube-dl` was no longer functioning correctly

#### Modifications

1. **Updated Download Function** (`download_video` function)
- Changed from `youtube-dl` to `yt-dlp`
- Improved format selection:
- Old: `bestvideo[ext=mp4]` (video only)
- New: `bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]` (video+audio)
- Additional options:
- `--no-check-certificate`: Avoid SSL certificate errors
- `--retries 3`: Retry on download failure
- `--fragment-retries 3`: Retry on fragment download failure
- `--no-part`: Do not use .part files
- `--merge-output-format`: Ensure specified output format

2. **Improved Error Handling** (`construct_download_queue` function)
- Added functionality to skip when subset files don't exist
- Improved to avoid processing unnecessary subsets during test runs

3. **Bug Fixes**
- Fixed logic error in resolution comparison:
- Old: `if not video_resolution != video_data['resolution']:` (double negative)
- New: `if video_resolution != int(video_data['resolution']):`
- Also fixed string vs integer comparison error

4. **Documentation Updates**
- Updated help text and comments from `youtube-dl` to `yt-dlp`

#### Test Results
- Successfully tested with a single video
- Confirmed successful video download, cropping, and resizing

#### Usage
```bash
# Verify yt-dlp is installed
which yt-dlp

# Download HDTF dataset
python download.py --output_dir /data/nishida/HDTF --num_workers 8
```

#### Notes
- Some videos may not be available for download as they might have been removed from YouTube
- The `_videos_raw` directory can be deleted after processing (to save disk space)
157 changes: 151 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,23 @@
Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset
<a href="https://openaccess.thecvf.com/content/CVPR2021/papers/Zhang_Flow-Guided_One-Shot_Talking_Face_Generation_With_a_High-Resolution_Audio-Visual_Dataset_CVPR_2021_paper.pdf" target="_blank">paper</a> <a href="https://github.com/MRzzm/HDTF/blob/main/Supplementary%20Materials.pdf" target="_blank">supplementary</a> [demo video](https://www.youtube.com/watch?v=uJdBgWYBTww)

## Overview

The HDTF (High-resolution Talking Face) dataset provides high-quality talking face videos for research in audio-visual speech processing, lip synchronization, and talking face generation. This repository contains an improved version of the dataset downloader that automates the process of downloading, cutting, and cropping videos from YouTube.

## Dataset Structure

The HDTF dataset is organized into three subsets:
- **RD (Radio)**: Radio/podcast style videos
- **WDA**: Videos featuring various speakers (including political figures)
- **WRA**: Additional speaker videos

## Details of HDTF dataset
**./HDTF_dataset** consists of *youtube video url*, *video resolution* (in our method, may not be the best resolution), *time stamps of talking face*, *facial region* (in the our method) and *the zoom scale* of the cropped window.

**xx_video_url.txt:**

### Metadata Files

**xx_video_url.txt:**
```
format: video name | video youtube url
```
Expand All @@ -31,18 +42,17 @@ format: video name+clip index | min_width | width | min_height | height (in
format: video name+clip index | window zoom scale
```


## Processing of HDTF dataset
When using HDTF dataset,
When using HDTF dataset,

- We provide video and url in **xx_video_url.txt**. (the highest definition of videos are 1080P or 720P). Transform video into **.mp4** format and transform interlaced video to progressive video as well.

- We split long original video into talking head clips with time stamps in **xx_annotion_time.txt**. Name the splitted clip as **video name_clip index.mp4**. For example, split the video *Radio11.mp4 00:30-01:00 01:30-02:30* into *Radio11_0.mp4* and *Radio11_1.mp4* .

- Our work does not always download videos with the best resolution, so we provide two cropping methods. Thanks @universome and @Feii Yin for pointing out this problem!
- Our work does not always download videos with the best resolution, so we provide two cropping methods. Thanks @universome and @Feii Yin for pointing out this problem!

1. Download the video with reference resulotion in **xx_resolution.txt** and crop the facial region with fixed window size in **xx_crop_wh.txt**. (This method is as same as ours, but the downloaded video may not be the best resolution).
2. First, download the video with best resulotion. Then, detect the facial landmark in the splitted talking head clips and count the square window of the face, specifically, count the facial region in each frame and merge all regions into one square range. Next, enlarge the window size with **xx_crop_ratio.txt**. Finally, crop the facial region.
2. First, download the video with best resulotion. Then, detect the facial landmark in the splitted talking head clips and count the square window of the face, specifically, count the facial region in each frame and merge all regions into one square range. Next, enlarge the window size with **xx_crop_ratio.txt**. Finally, crop the facial region.

- We resize all cropped videos into **512 x 512** resolution.

Expand All @@ -62,6 +72,141 @@ The code is in **./code_animation2video**, pls visit [here](https://github.com/M
#### code of reproducing other works
coming soon......

## Installation

```bash
# Install required dependencies
pip install tqdm yt-dlp

# Ensure ffmpeg is installed
# Ubuntu/Debian:
sudo apt-get install ffmpeg
# macOS:
brew install ffmpeg
```

## Downloading
For convenience, we provide the `download.py` script which downloads, crops and resizes the dataset. You can use it via the following command:
```
python download.py --output_dir /path/to/output/dir --num_workers 8
```

### Command Line Arguments
- `--source_dir` or `-s`: Path to metadata directory (default: `HDTF_dataset`)
- `--output_dir` or `-o`: Where to save processed videos (required)
- `--num_workers` or `-w`: Number of parallel download workers (default: 8)

Note: some videos might become unavailable if the authors will remove them or make them private.

## Output Structure

After running `download.py`, the output directory will contain:

```
output_dir/
├── _videos_raw/ # Temporary directory (can be deleted after processing)
│ ├── {subset}_{videoname}.mp4 # Raw downloaded videos
│ └── {subset}_{videoname}_download_log.txt # Download logs
├── {subset}_{videoname}_000.mp4 # Processed clip 1
├── {subset}_{videoname}_001.mp4 # Processed clip 2
└── ... # More clips
```

### Output Characteristics
- **Video only**: No audio track included
- **Square format**: All clips are square (width = height)
- **High resolution**: Maintains original resolution (720p or 1080p)
- **Face-centered**: Cropped to center on the speaker's face
- **Systematic naming**: `{subset}_{videoname}_{clipindex:03d}.mp4`

## Processing Pipeline

The `download.py` script performs the following steps:

1. **Download**: Fetches videos from YouTube at specified resolution using yt-dlp
2. **Cut**: Extracts time intervals containing talking faces based on annotation files
3. **Crop**: Applies facial region cropping to create square clips using FFmpeg
4. **Save**: Outputs individual clips with systematic naming

## Improvements Over Original Code

### 1. **Migration to yt-dlp**
- **Original**: Used deprecated `youtube-dl` library
- **Improved**: Updated to actively maintained `yt-dlp` with better reliability
- **Benefits**:
- Better handling of YouTube API changes
- Improved download success rates
- More robust error handling

### 2. **Enhanced Download Robustness**
Added several flags to improve download reliability:
```python
"--retries", "3", # Retry failed downloads
"--fragment-retries", "3", # Retry failed fragments
"--no-part", # Avoid partial file issues
"--no-check-certificate", # Handle SSL certificate issues
"--merge-output-format", mp4 # Ensure consistent output format
```

### 3. **Improved Format Selection**
- **Original**: Basic format string that could fail
- **Improved**: Sophisticated format selection with fallbacks
```python
# Example for 720p:
"bestvideo[height=720][ext=mp4]+bestaudio[ext=m4a]/best[height=720][ext=mp4]"
```

### 4. **Better Error Handling**
- Added graceful handling of missing metadata files
- Skip subsets if files are not found instead of crashing
- More informative error messages for debugging
- Individual download logs for each video

### 5. **Code Quality Improvements**
- Added comprehensive documentation and docstrings
- Better variable naming for clarity
- Type hints in function signatures
- Improved code organization and readability

### 6. **Robustness Features**
- File existence checks before processing
- Validation of downloaded video resolution
- Proper handling of videos with multiple clips
- Clear progress indication with tqdm

## Troubleshooting

### Common Issues

1. **Download Failures**
- Check internet connection
- Verify YouTube URLs are still valid
- Review individual download logs in `_videos_raw/`
- Some videos may have been removed or made private

2. **Resolution Mismatches**
- The requested resolution may not be available
- Script will skip videos if downloaded resolution doesn't match metadata
- Consider updating metadata files if needed

3. **Missing Videos**
- Videos without proper cropping information are skipped
- Check console output for specific reasons
- Some videos may be discarded due to quality issues

### Manual Recovery

For failed downloads:
1. Check the download log: `_videos_raw/{video_name}_download_log.txt`
2. Try downloading manually with yt-dlp
3. Update metadata files if video formats have changed

## Storage Notes

- The `_videos_raw/` directory contains full-length downloaded videos
- This directory can be safely deleted after processing to save space
- Final processed clips are much smaller than raw videos

## Reference
if you use HDTF, pls reference

Expand Down
Loading