*EqualΒ contribution.β β CorrespondingΒ authors.
This repo contains the official code of our paper: Relit-LiVE: Relight Video by Jointly Learning Environment Video.
We present Relit-LiVE, a novel video relighting framework that produces physically consistent and temporally stable results without needing prior knowledge of camera pose. This is achieved by jointly generating relighting videos and environment videos. Additionally, by integrating real-world lighting effects with intrinsic constraints, the relighting videos demonstrate remarkable physical plausibility, showcasing realistic reflections and shadows.
- May 8, 2026: Release project page and infer pipeline.
- Release
the arxivandproject page. - Release
inference codeandmodel checkpoints. - Release
gradio codeandfull inference pipeline (inverse-forward). - Release
training codeanddata pipeline. - Release
training dataset.
- Python 3.10
- NVIDIA GPU, with at least 24 GB VRAM recommended
- CUDA 12.4 or a compatible version
- Model weights prepared under
checkpoints/andmodels/Wan-AI/Wan2.1-T2V-1.3B/
Recommended environment:
- Ubuntu 20.04 or newer
- Single-GPU CUDA inference setup
conda create -n diffsynth python=3.10
conda activate diffsynth
pip install -e .
pip install -U deepspeed
pip install transformers==4.50.0
pip install gradio==6.14.0The cosmos-transfer1-diffusion-renderer repository is essential for full pipeline inference. Install the conda environment named cosmos-predict1 following the instructions in its README.md.
cd third_party
git clone https://github.com/nv-tlabs/cosmos-transfer1-diffusion-renderer.git
...Download the Relit-LiVE checkpoints from HuggingFace and place them under checkpoints/.
| Checkpoint | Resolution | Frames | Download |
|---|---|---|---|
model_frame25_480_832.ckpt |
480 Γ 832 | 8n+1, nβ{0,1,2,3} β 1/9/17/25 | π€ Download |
model_frame57_480_832.ckpt |
480 Γ 832 | 8n+1, nβ{0,β¦,7} β 1/9/β¦/57 | π€ Download |
model_frame1_1024_1472.ckpt |
1024 Γ 1472 | 1 (image) | π€ Download |
In addition, inference loads the Wan2.1 base model from models/Wan-AI/Wan2.1-T2V-1.3B/. Make sure all weights are in place before running inference.
If you want to reproduce the MIT metrics reported in the paper, you should load the model_frame57_480_832.ckpt and perform single-frame inference directly on the test set.
(Optional for full inference pipeline) Download the cosmos-transfer1-diffusion-renderer checkpoints from HuggingFace and place them under third_party/cosmos-transfer1-diffusion-renderer/checkpoints/ following the instructions in its README.md.
By default, generated results are written to inference_output/.
python relit_inference.py \
--dataset_path datasets/demos \
--ckpt_path checkpoints/model_frame25_480_832.ckpt \
--output_dir inference_output \
--cfg_scale 1.0 \
--height 480 \
--width 832 \
--num_frames 25 \
--padding_resolution \
--use_ref_image \
--env_map_path datasets/envs/Pink_Sunrise \
--frame_interval 1 \
--num_inference_steps 50 \
--quality 10python relit_inference.py \
--dataset_path datasets/demos \
--ckpt_path checkpoints/model_frame25_480_832.ckpt \
--output_dir inference_output \
--cfg_scale 1.0 \
--height 480 \
--width 832 \
--num_frames 25 \
--padding_resolution \
--use_ref_image \
--env_map_path datasets/envs/Pink_Sunrise \
--frame_interval 1 \
--num_inference_steps 50 \
--use_rotate_light \
--quality 10python relit_inference.py \
--dataset_path datasets/demos \
--ckpt_path checkpoints/model_frame25_480_832.ckpt \
--output_dir inference_output \
--cfg_scale 1.0 \
--height 480 \
--width 832 \
--num_frames 25 \
--padding_resolution \
--use_ref_image \
--env_map_path datasets/envs/Pink_Sunrise \
--frame_interval 1 \
--num_inference_steps 50 \
--use_fixed_frame_and_w_rotate_light \
--quality 10python relit_inference.py \
--dataset_path datasets/demos \
--ckpt_path checkpoints/model_frame25_480_832.ckpt \
--output_dir inference_output \
--cfg_scale 1.0 \
--height 480 \
--width 832 \
--num_frames 25 \
--padding_resolution \
--use_ref_image \
--env_map_path datasets/envs/Pink_Sunrise \
--frame_interval 1 \
--num_inference_steps 50 \
--use_fixed_frame_and_h_rotate_light \
--quality 10python relit_inference.py \
--dataset_path datasets/demos \
--ckpt_path checkpoints/model_frame57_480_832.ckpt \
--output_dir inference_output \
--cfg_scale 1.0 \
--height 480 \
--width 832 \
--num_frames 57 \
--padding_resolution \
--use_ref_image \
--env_map_path datasets/envs/Pink_Sunrise \
--frame_interval 1 \
--num_inference_steps 50 \
--quality 10python relit_inference.py \
--dataset_path datasets/demos \
--ckpt_path checkpoints/model_frame1_1024_1472.ckpt \
--output_dir inference_output \
--cfg_scale 1.0 \
--height 1024 \
--width 1472 \
--num_frames 1 \
--padding_resolution \
--use_ref_image \
--env_map_path datasets/envs/Pink_Sunrise \
--frame_interval 1 \
--num_inference_steps 50 \
--quality 10The following arguments are defined in parse_args() inside relit_inference.py.
| Argument | Type | Default | Description |
|---|---|---|---|
--dataset_path |
str | ./example_test_data |
Input dataset directory. The examples above use datasets/demos. |
--env_map_path |
str | None |
External environment map directory. If not provided, the script reads lighting data from each sample. |
--use_ref_image |
flag | False |
Enable the reference-image branch. |
--use_muti_ref_image |
flag | False |
Enable multi-reference-image mode. The argument name follows the current code spelling. |
--ref_image_path_with_idddx |
str | None |
Template path for external reference images. The script replaces idddx with the sample index. |
--full_resolution |
flag | False |
Use the full-resolution input pipeline. |
--padding_resolution |
flag | False |
Use a padding-based resize strategy to reduce aggressive cropping. |
--dataset_type |
str | relit-live |
Dataset format. The default matches the Relit-LiVE directory structure in this repository. |
--drop_mr |
flag | False |
Ignore metallic and roughness conditioning. |
--use_rotate_light |
flag | False |
Enable dynamic light rotation mode. |
--use_fixed_frame_and_w_rotate_light |
flag | False |
Keep the first frame fixed and rotate lighting along the environment-map width axis. |
--use_fixed_frame_and_h_rotate_light |
flag | False |
Keep the first frame fixed and rotate lighting along the environment-map height axis. |
--h_rotate_light |
int | 0 |
Apply vertical environment-map rotation to each frame, in degrees. |
--w_rotate_light |
int | 0 |
Apply horizontal environment-map rotation to each frame, in pixels. |
--num_frames |
int | 81 |
Number of output frames. When set to 1, the script saves a png; otherwise it saves an mp4. |
--num_inference_steps |
int | 50 |
Number of denoising inference steps. |
--frame_interval |
int | 1 |
Sampling interval when reading the input video or image sequence. |
--height |
int | 480 |
Output height. |
--width |
int | 832 |
Output width. |
--ckpt_path |
str | None |
Path to the checkpoint to load. |
--output_dir |
str | ./results |
Default output directory. |
--output_path |
str | None |
Explicit output file path. Only .mp4 and .png are supported. |
--dataloader_num_workers |
int | 1 |
Number of DataLoader workers. |
--cfg_scale |
float | 5.0 |
Classifier-free guidance scale. |
--wo_ref_weight |
float | 0.0 |
Weight for the branch without reference-image conditioning. |
--quality |
int | 5 |
Video quality value passed to imageio when saving mp4 files. |
- Output filenames automatically include parts of the checkpoint name, sequence name, resolution, reference-image mode, environment lighting information, inference steps, frame count, and
cfg_scale. - When
--num_frames 1is used, the script writes a png. When--num_frames > 1, it writes an mp4.
Please make sure you have the following items ready:
- conda environment named diffsynth.
- conda environment named cosmos-predict1.
./checkpoints/*.ckpt../third_party/cosmos-transfer1-diffusion-renderer../third_party/cosmos-transfer1-diffusion-renderer/checkpoints/Cosmos-Tokenize1-CV8x8x8-720pand./third_party/cosmos-transfer1-diffusion-renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B.
Then,
conda activate diffsynth
python run_full_inference_gradio.pyThis project will be continuously maintained. We welcome users to try it out and share their feedback (15770575681@163.com).
The current plan includes a model version specifically designed for portraits and another that is better suited for handling motion (including camera and scene dynamics).
If you find this repository helpful, please consider citing our paper:
@article{xiao2026relit,
title={Relit-LiVE: Relight Video by Jointly Learning Environment Video},
author={Xiao, Weiqing and Li, Hong and Yang, Xiuyu and Chen, Houyuan and Li, Wenyi and Liu, Tianqi and Xu, Shaocong and Ye, Chongjie and Zhao, Hao and Wang, Beibei},
journal={arXiv preprint arXiv:2605.06658},
year={2026}
}Code is built on DiffSynth-Studio and diffusion-renderer. Thanks all the authors for their excellent contributions!
