microsoft · yaohualibin · Mar 3, 2025 · Feb 28, 2025
diff --git a/DCVC-RT/README.md b/DCVC-RT/README.md
@@ -0,0 +1,113 @@
+# Introduction
+
+Official Pytorch implementation for DCVC-RT: [Towards Practical **R**eal-**T**ime Neural Video Compression](https://arxiv.org/abs/2502.20762), in CVPR 2025.
+
+# Prerequisites
+* Python 3.12 and conda, get [Conda](https://www.anaconda.com/)
+* CUDA 12.6 (other versions may also work. Make sure the CUDA version matches with pytorch.)
+* pytorch (We have tested that pytorch-2.6 works. Other versions may also work.)
+* Environment
+    ```
+    conda create -n $YOUR_PY_ENV_NAME python=3.12
+    conda activate $YOUR_PY_ENV_NAME
+
+    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
+    pip install -r requirements.txt
+    ```
+
+# Test dataset
+
+We support arbitrary original resolution. The input video resolution will be padded automatically. The reconstructed video will be cropped back to the original size. The distortion (PSNR) is calculated at original resolution.
+
+## YUV 420 content
+
+Put *.yuv in the folder structure similar to the following structure.
+
+    /media/data/HEVC_B/
+        - BQTerrace_1920x1080_60.yuv
+        - BasketballDrive_1920x1080_50.yuv
+        - ...
+    /media/data/HEVC_D/
+    /media/data/HEVC_C/
+    ...
+
+The dataset structure can be seen in dataset_config_example_yuv420.json.
+
+## RGB content
+
+We highly suggest testing YUV420 content. To test RGB content, please refer to the [DCVC-FM](../DCVC-FM) folder.
+
+# Build the project
+Please build the C++ code to support bitstream writing and customized CUDA kernels to fuse operations.
+
+```bash
+sudo apt-get install cmake g++ ninja-build
+conda activate $YOUR_PY_ENV_NAME
+cd ./src/cpp/
+pip install .
+cd ../layers/extensions/inference/
+pip install .
+```
+
+# CPU performance scaling
+
+Note that the arithmetic coding runs on the CPU, please make sure your CPU runs at high performance while writing the actual bitstream. Otherwise, the arithmetic coding may take a long time.
+
+Check the CPU frequency by
+```
+grep -E '^model name|^cpu MHz' /proc/cpuinfo
+```
+
+Run the following command to maximum CPU frequency
+```
+echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
+```
+
+Run the following command to recover the default frequency
+```
+echo ondemand | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
+```
+
+# Pretrained models
+
+* Download [our pretrained models](https://1drv.ms/f/c/2866592d5c55df8c/Esu0KJ-I2kxCjEP565ARx_YB88i0UnR6XnODqFcvZs4LcA?e=by8CO8) and put them into ./checkpoints folder.
+* There are 2 models, one for image coding and the other for video coding.
+
+# Test the models
+
+Example to test pretrained model with four rate points:
+```bash
+ python test_video.py --model_path_i ./checkpoints/cvpr2025_image.pth.tar --model_path_p ./checkpoints/cvpr2025_video.pth.tar --rate_num 4 --test_config ./dataset_config_example_yuv420.json --cuda 1 -w 1 --write_stream 1 --force_zero_thres 0.12 --output_path output.json --force_intra_period -1 --reset_interval 64 --force_frame_num -1 --check_existing 0
+```
+
+It is recommended that the ```-w``` number is equal to your GPU number.
+
+You can also specify different ```--rate_num``` values (2~64) to test finer bitrate adjustment.
+
+# Comparing with other method
+Bit saving over VTM-17.0 (UVG all frames with single intra-frame setting (i.e. intra-period = –1) and YUV420 colorspace.)
+
+<img src="assets/RD-Curve.png" width="750">
+
+The BD-Rate and encoding/decoding speed on Nvidia A100 GPU
+
+<img src="assets/bd_rate_speed.png" width="750">
+
+# Acknowledgement
+The implementation is based on [CompressAI](https://github.com/InterDigitalInc/CompressAI).
+
+# Citation
+If you find this work useful for your research, please cite:
+
+```
+@inproceedings{jia2025towards,
+  title={Towards Practical Real-Time Neural Video Compression},
+  author={Jia, Zhaoyang and Li, Bin and Li, Jiahao and Xie, Wenxuan and Qi, Linfeng and Li, Houqiang and Lu, Yan},
+  booktitle={{IEEE/CVF} Conference on Computer Vision and Pattern Recognition,
+             {CVPR} 2025, Nashville, TN, USA, June 11-25, 2024},
+  year={2025}
+}
+```
+
+# Trademarks
+This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.
diff --git a/DCVC-RT/assets/RD-Curve.png b/DCVC-RT/assets/RD-Curve.png
diff --git a/DCVC-RT/assets/bd_rate_speed.png b/DCVC-RT/assets/bd_rate_speed.png
diff --git a/DCVC-RT/dataset_config_example_yuv420.json b/DCVC-RT/dataset_config_example_yuv420.json
@@ -0,0 +1,100 @@
+{
+    "root_path": "/media/data/",
+    "test_classes": {
+        "UVG": {
+            "test": 1,
+            "base_path": "UVG",
+            "src_type": "yuv420",
+            "sequences": {
+                "Beauty_1920x1080_120fps_420_8bit_YUV.yuv":        {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
+                "Bosphorus_1920x1080_120fps_420_8bit_YUV.yuv":     {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
+                "HoneyBee_1920x1080_120fps_420_8bit_YUV.yuv":      {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
+                "Jockey_1920x1080_120fps_420_8bit_YUV.yuv":        {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
+                "ReadySteadyGo_1920x1080_120fps_420_8bit_YUV.yuv": {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
+                "ShakeNDry_1920x1080_120fps_420_8bit_YUV.yuv":     {"width": 1920, "height": 1080, "frames": 300, "intra_period": -1},
+                "YachtRide_1920x1080_120fps_420_8bit_YUV.yuv":     {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1}
+            }
+        },
+        "MCL-JCV": {
+            "test": 1,
+            "base_path": "MCL-JCV",
+            "src_type": "yuv420",
+            "sequences": {
+                "videoSRC01_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
+                "videoSRC02_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
+                "videoSRC03_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
+                "videoSRC04_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
+                "videoSRC05_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
+                "videoSRC06_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
+                "videoSRC07_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
+                "videoSRC08_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
+                "videoSRC09_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
+                "videoSRC10_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
+                "videoSRC11_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
+                "videoSRC12_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
+                "videoSRC13_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
+                "videoSRC14_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
+                "videoSRC15_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
+                "videoSRC16_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
+                "videoSRC17_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
+                "videoSRC18_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
+                "videoSRC19_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
+                "videoSRC20_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
+                "videoSRC21_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
+                "videoSRC22_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
+                "videoSRC23_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
+                "videoSRC24_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
+                "videoSRC25_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
+                "videoSRC26_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
+                "videoSRC27_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
+                "videoSRC28_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
+                "videoSRC29_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
+                "videoSRC30_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1}
+            }
+        },
+        "HEVC_B": {
+            "test": 1,
+            "base_path": "HEVC_B",
+            "src_type": "yuv420",
+            "sequences": {
+                "BQTerrace_1920x1080_60.yuv":       {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
+                "BasketballDrive_1920x1080_50.yuv": {"width": 1920, "height": 1080, "frames": 500, "intra_period": -1},
+                "Cactus_1920x1080_50.yuv":          {"width": 1920, "height": 1080, "frames": 500, "intra_period": -1},
+                "Kimono1_1920x1080_24.yuv":         {"width": 1920, "height": 1080, "frames": 240, "intra_period": -1},
+                "ParkScene_1920x1080_24.yuv":       {"width": 1920, "height": 1080, "frames": 240, "intra_period": -1}
+            }
+        },
+        "HEVC_E": {
+            "test": 1,
+            "base_path": "HEVC_E",
+            "src_type": "yuv420",
+            "sequences": {
+                "FourPeople_1280x720_60.yuv":       {"width": 1280, "height": 720, "frames": 600, "intra_period": -1},
+                "Johnny_1280x720_60.yuv":           {"width": 1280, "height": 720, "frames": 600, "intra_period": -1},
+                "KristenAndSara_1280x720_60.yuv":   {"width": 1280, "height": 720, "frames": 600, "intra_period": -1}
+            }
+        },
+        "HEVC_C": {
+            "test": 1,
+            "base_path": "HEVC_C",
+            "src_type": "yuv420",
+            "sequences": {
+                "BQMall_832x480_60.yuv":            {"width": 832, "height": 480, "frames": 600, "intra_period": -1},
+                "BasketballDrill_832x480_50.yuv":   {"width": 832, "height": 480, "frames": 500, "intra_period": -1},
+                "PartyScene_832x480_50.yuv":        {"width": 832, "height": 480, "frames": 500, "intra_period": -1},
+                "RaceHorses_832x480_30.yuv":        {"width": 832, "height": 480, "frames": 300, "intra_period": -1}
+            }
+        },
+        "HEVC_D": {
+            "test": 1,
+            "base_path": "HEVC_D",
+            "src_type": "yuv420",
+            "sequences": {
+                "BasketballPass_416x240_50.yuv":    {"width": 416, "height": 240, "frames": 500, "intra_period": -1},
+                "BlowingBubbles_416x240_50.yuv":    {"width": 416, "height": 240, "frames": 500, "intra_period": -1},
+                "BQSquare_416x240_60.yuv":          {"width": 416, "height": 240, "frames": 600, "intra_period": -1},
+                "RaceHorses_416x240_30.yuv":        {"width": 416, "height": 240, "frames": 300, "intra_period": -1}
+            }
+        }
+    }
+}
diff --git a/DCVC-RT/requirements.txt b/DCVC-RT/requirements.txt
@@ -0,0 +1,7 @@
+numpy>=1.20.0
+scipy
+matplotlib
+tqdm
+bd-metric
+pillow
+pybind11