Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions DCVC-RT/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Introduction

Official Pytorch implementation for DCVC-RT: [Towards Practical **R**eal-**T**ime Neural Video Compression](https://arxiv.org/abs/2502.20762), in CVPR 2025.

# Prerequisites
* Python 3.12 and conda, get [Conda](https://www.anaconda.com/)
* CUDA 12.6 (other versions may also work. Make sure the CUDA version matches with pytorch.)
* pytorch (We have tested that pytorch-2.6 works. Other versions may also work.)
* Environment
```
conda create -n $YOUR_PY_ENV_NAME python=3.12
conda activate $YOUR_PY_ENV_NAME

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt
```

# Test dataset

We support arbitrary original resolution. The input video resolution will be padded automatically. The reconstructed video will be cropped back to the original size. The distortion (PSNR) is calculated at original resolution.

## YUV 420 content

Put *.yuv in the folder structure similar to the following structure.

/media/data/HEVC_B/
- BQTerrace_1920x1080_60.yuv
- BasketballDrive_1920x1080_50.yuv
- ...
/media/data/HEVC_D/
/media/data/HEVC_C/
...

The dataset structure can be seen in dataset_config_example_yuv420.json.

## RGB content

We highly suggest testing YUV420 content. To test RGB content, please refer to the [DCVC-FM](../DCVC-FM) folder.

# Build the project
Please build the C++ code to support bitstream writing and customized CUDA kernels to fuse operations.

```bash
sudo apt-get install cmake g++ ninja-build
conda activate $YOUR_PY_ENV_NAME
cd ./src/cpp/
pip install .
cd ../layers/extensions/inference/
pip install .
```

# CPU performance scaling

Note that the arithmetic coding runs on the CPU, please make sure your CPU runs at high performance while writing the actual bitstream. Otherwise, the arithmetic coding may take a long time.

Check the CPU frequency by
```
grep -E '^model name|^cpu MHz' /proc/cpuinfo
```

Run the following command to maximum CPU frequency
```
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
```

Run the following command to recover the default frequency
```
echo ondemand | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
```

# Pretrained models

* Download [our pretrained models](https://1drv.ms/f/c/2866592d5c55df8c/Esu0KJ-I2kxCjEP565ARx_YB88i0UnR6XnODqFcvZs4LcA?e=by8CO8) and put them into ./checkpoints folder.
* There are 2 models, one for image coding and the other for video coding.

# Test the models

Example to test pretrained model with four rate points:
```bash
python test_video.py --model_path_i ./checkpoints/cvpr2025_image.pth.tar --model_path_p ./checkpoints/cvpr2025_video.pth.tar --rate_num 4 --test_config ./dataset_config_example_yuv420.json --cuda 1 -w 1 --write_stream 1 --force_zero_thres 0.12 --output_path output.json --force_intra_period -1 --reset_interval 64 --force_frame_num -1 --check_existing 0
```

It is recommended that the ```-w``` number is equal to your GPU number.

You can also specify different ```--rate_num``` values (2~64) to test finer bitrate adjustment.

# Comparing with other method
Bit saving over VTM-17.0 (UVG all frames with single intra-frame setting (i.e. intra-period = –1) and YUV420 colorspace.)

<img src="assets/RD-Curve.png" width="750">

The BD-Rate and encoding/decoding speed on Nvidia A100 GPU

<img src="assets/bd_rate_speed.png" width="750">

# Acknowledgement
The implementation is based on [CompressAI](https://github.com/InterDigitalInc/CompressAI).

# Citation
If you find this work useful for your research, please cite:

```
@inproceedings{jia2025towards,
title={Towards Practical Real-Time Neural Video Compression},
author={Jia, Zhaoyang and Li, Bin and Li, Jiahao and Xie, Wenxuan and Qi, Linfeng and Li, Houqiang and Lu, Yan},
booktitle={{IEEE/CVF} Conference on Computer Vision and Pattern Recognition,
{CVPR} 2025, Nashville, TN, USA, June 11-25, 2024},
year={2025}
}
```

# Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.
Binary file added DCVC-RT/assets/RD-Curve.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added DCVC-RT/assets/bd_rate_speed.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
100 changes: 100 additions & 0 deletions DCVC-RT/dataset_config_example_yuv420.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
{
"root_path": "/media/data/",
"test_classes": {
"UVG": {
"test": 1,
"base_path": "UVG",
"src_type": "yuv420",
"sequences": {
"Beauty_1920x1080_120fps_420_8bit_YUV.yuv": {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
"Bosphorus_1920x1080_120fps_420_8bit_YUV.yuv": {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
"HoneyBee_1920x1080_120fps_420_8bit_YUV.yuv": {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
"Jockey_1920x1080_120fps_420_8bit_YUV.yuv": {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
"ReadySteadyGo_1920x1080_120fps_420_8bit_YUV.yuv": {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
"ShakeNDry_1920x1080_120fps_420_8bit_YUV.yuv": {"width": 1920, "height": 1080, "frames": 300, "intra_period": -1},
"YachtRide_1920x1080_120fps_420_8bit_YUV.yuv": {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1}
}
},
"MCL-JCV": {
"test": 1,
"base_path": "MCL-JCV",
"src_type": "yuv420",
"sequences": {
"videoSRC01_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC02_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC03_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC04_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC05_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
"videoSRC06_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
"videoSRC07_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
"videoSRC08_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
"videoSRC09_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
"videoSRC10_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC11_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC12_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC13_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC14_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC15_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC16_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC17_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
"videoSRC18_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
"videoSRC19_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC20_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
"videoSRC21_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
"videoSRC22_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
"videoSRC23_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
"videoSRC24_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
"videoSRC25_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
"videoSRC26_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC27_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC28_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC29_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
"videoSRC30_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1}
}
},
"HEVC_B": {
"test": 1,
"base_path": "HEVC_B",
"src_type": "yuv420",
"sequences": {
"BQTerrace_1920x1080_60.yuv": {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
"BasketballDrive_1920x1080_50.yuv": {"width": 1920, "height": 1080, "frames": 500, "intra_period": -1},
"Cactus_1920x1080_50.yuv": {"width": 1920, "height": 1080, "frames": 500, "intra_period": -1},
"Kimono1_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 240, "intra_period": -1},
"ParkScene_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 240, "intra_period": -1}
}
},
"HEVC_E": {
"test": 1,
"base_path": "HEVC_E",
"src_type": "yuv420",
"sequences": {
"FourPeople_1280x720_60.yuv": {"width": 1280, "height": 720, "frames": 600, "intra_period": -1},
"Johnny_1280x720_60.yuv": {"width": 1280, "height": 720, "frames": 600, "intra_period": -1},
"KristenAndSara_1280x720_60.yuv": {"width": 1280, "height": 720, "frames": 600, "intra_period": -1}
}
},
"HEVC_C": {
"test": 1,
"base_path": "HEVC_C",
"src_type": "yuv420",
"sequences": {
"BQMall_832x480_60.yuv": {"width": 832, "height": 480, "frames": 600, "intra_period": -1},
"BasketballDrill_832x480_50.yuv": {"width": 832, "height": 480, "frames": 500, "intra_period": -1},
"PartyScene_832x480_50.yuv": {"width": 832, "height": 480, "frames": 500, "intra_period": -1},
"RaceHorses_832x480_30.yuv": {"width": 832, "height": 480, "frames": 300, "intra_period": -1}
}
},
"HEVC_D": {
"test": 1,
"base_path": "HEVC_D",
"src_type": "yuv420",
"sequences": {
"BasketballPass_416x240_50.yuv": {"width": 416, "height": 240, "frames": 500, "intra_period": -1},
"BlowingBubbles_416x240_50.yuv": {"width": 416, "height": 240, "frames": 500, "intra_period": -1},
"BQSquare_416x240_60.yuv": {"width": 416, "height": 240, "frames": 600, "intra_period": -1},
"RaceHorses_416x240_30.yuv": {"width": 416, "height": 240, "frames": 300, "intra_period": -1}
}
}
}
}
7 changes: 7 additions & 0 deletions DCVC-RT/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
numpy>=1.20.0
scipy
matplotlib
tqdm
bd-metric
pillow
pybind11
Loading
Loading