Skip to content

zhuxing0/Relit-LiVE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Relit-LiVE: Relight Video by Jointly Learning Environment Video

WeiqingΒ Xiao1,* HongΒ Li2,3,* XiuyuΒ Yang4,* HouyuanΒ Chen5 WenyiΒ Li6 TianqiΒ Liu7 ShaocongΒ Xu2 ChongjieΒ Ye8 HaoΒ Zhao4,2,† BeibeiΒ Wang1,†
1NanjingΒ University  2BAAI  3BeihangΒ University  4TsinghuaΒ University  5HKUST  6UCAS  7HUST  8CUHK-Shenzhen
*EqualΒ contribution.  †CorrespondingΒ authors.

Nanjing University  BAAI  Beihang University  Tsinghua University

Paper Website HuggingFace Model License

This repo contains the official code of our paper: Relit-LiVE: Relight Video by Jointly Learning Environment Video.

πŸ“Š Overview

Overview

We present Relit-LiVE, a novel video relighting framework that produces physically consistent and temporally stable results without needing prior knowledge of camera pose. This is achieved by jointly generating relighting videos and environment videos. Additionally, by integrating real-world lighting effects with intrinsic constraints, the relighting videos demonstrate remarkable physical plausibility, showcasing realistic reflections and shadows.

✨ News

  • May 8, 2026: Release project page and infer pipeline.

πŸ“ Check list

  • Release the arxiv and project page.
  • Release inference code and model checkpoints.
  • Release gradio code and full inference pipeline (inverse-forward).
  • Release training code and data pipeline.
  • Release training dataset.

πŸ› οΈ Installation

Minimum requirements

  • Python 3.10
  • NVIDIA GPU, with at least 24 GB VRAM recommended
  • CUDA 12.4 or a compatible version
  • Model weights prepared under checkpoints/ and models/Wan-AI/Wan2.1-T2V-1.3B/

Recommended environment:

  • Ubuntu 20.04 or newer
  • Single-GPU CUDA inference setup

Conda environment

conda create -n diffsynth python=3.10
conda activate diffsynth
pip install -e .
pip install -U deepspeed
pip install transformers==4.50.0
pip install gradio==6.14.0

Optional for full inference pipeline

The cosmos-transfer1-diffusion-renderer repository is essential for full pipeline inference. Install the conda environment named cosmos-predict1 following the instructions in its README.md.

cd third_party
git clone https://github.com/nv-tlabs/cosmos-transfer1-diffusion-renderer.git
...

πŸ“¦ Checkpoints

Download the Relit-LiVE checkpoints from HuggingFace and place them under checkpoints/.

Checkpoint Resolution Frames Download
model_frame25_480_832.ckpt 480 Γ— 832 8n+1, n∈{0,1,2,3} β†’ 1/9/17/25 πŸ€— Download
model_frame57_480_832.ckpt 480 Γ— 832 8n+1, n∈{0,…,7} β†’ 1/9/…/57 πŸ€— Download
model_frame1_1024_1472.ckpt 1024 Γ— 1472 1 (image) πŸ€— Download

In addition, inference loads the Wan2.1 base model from models/Wan-AI/Wan2.1-T2V-1.3B/. Make sure all weights are in place before running inference.

If you want to reproduce the MIT metrics reported in the paper, you should load the model_frame57_480_832.ckpt and perform single-frame inference directly on the test set.

(Optional for full inference pipeline) Download the cosmos-transfer1-diffusion-renderer checkpoints from HuggingFace and place them under third_party/cosmos-transfer1-diffusion-renderer/checkpoints/ following the instructions in its README.md.

πŸš€ Inference

By default, generated results are written to inference_output/.

Basic 25-frame relighting

python relit_inference.py \
    --dataset_path datasets/demos \
    --ckpt_path checkpoints/model_frame25_480_832.ckpt \
    --output_dir inference_output \
    --cfg_scale 1.0 \
    --height 480 \
    --width 832 \
    --num_frames 25 \
    --padding_resolution \
    --use_ref_image \
    --env_map_path datasets/envs/Pink_Sunrise \
    --frame_interval 1 \
    --num_inference_steps 50 \
    --quality 10

25-frame rotating-light relighting

python relit_inference.py \
    --dataset_path datasets/demos \
    --ckpt_path checkpoints/model_frame25_480_832.ckpt \
    --output_dir inference_output \
    --cfg_scale 1.0 \
    --height 480 \
    --width 832 \
    --num_frames 25 \
    --padding_resolution \
    --use_ref_image \
    --env_map_path datasets/envs/Pink_Sunrise \
    --frame_interval 1 \
    --num_inference_steps 50 \
    --use_rotate_light \
    --quality 10

Fixed-frame relighting with width-axis light rotation

python relit_inference.py \
    --dataset_path datasets/demos \
    --ckpt_path checkpoints/model_frame25_480_832.ckpt \
    --output_dir inference_output \
    --cfg_scale 1.0 \
    --height 480 \
    --width 832 \
    --num_frames 25 \
    --padding_resolution \
    --use_ref_image \
    --env_map_path datasets/envs/Pink_Sunrise \
    --frame_interval 1 \
    --num_inference_steps 50 \
    --use_fixed_frame_and_w_rotate_light \
    --quality 10

Fixed-frame relighting with height-axis light rotation

python relit_inference.py \
    --dataset_path datasets/demos \
    --ckpt_path checkpoints/model_frame25_480_832.ckpt \
    --output_dir inference_output \
    --cfg_scale 1.0 \
    --height 480 \
    --width 832 \
    --num_frames 25 \
    --padding_resolution \
    --use_ref_image \
    --env_map_path datasets/envs/Pink_Sunrise \
    --frame_interval 1 \
    --num_inference_steps 50 \
    --use_fixed_frame_and_h_rotate_light \
    --quality 10

57-frame video relighting

python relit_inference.py \
    --dataset_path datasets/demos \
    --ckpt_path checkpoints/model_frame57_480_832.ckpt \
    --output_dir inference_output \
    --cfg_scale 1.0 \
    --height 480 \
    --width 832 \
    --num_frames 57 \
    --padding_resolution \
    --use_ref_image \
    --env_map_path datasets/envs/Pink_Sunrise \
    --frame_interval 1 \
    --num_inference_steps 50 \
    --quality 10

Single-frame high-resolution relighting

python relit_inference.py \
    --dataset_path datasets/demos \
    --ckpt_path checkpoints/model_frame1_1024_1472.ckpt \
    --output_dir inference_output \
    --cfg_scale 1.0 \
    --height 1024 \
    --width 1472 \
    --num_frames 1 \
    --padding_resolution \
    --use_ref_image \
    --env_map_path datasets/envs/Pink_Sunrise \
    --frame_interval 1 \
    --num_inference_steps 50 \
    --quality 10

πŸ“‹ Argument reference

The following arguments are defined in parse_args() inside relit_inference.py.

Argument Type Default Description
--dataset_path str ./example_test_data Input dataset directory. The examples above use datasets/demos.
--env_map_path str None External environment map directory. If not provided, the script reads lighting data from each sample.
--use_ref_image flag False Enable the reference-image branch.
--use_muti_ref_image flag False Enable multi-reference-image mode. The argument name follows the current code spelling.
--ref_image_path_with_idddx str None Template path for external reference images. The script replaces idddx with the sample index.
--full_resolution flag False Use the full-resolution input pipeline.
--padding_resolution flag False Use a padding-based resize strategy to reduce aggressive cropping.
--dataset_type str relit-live Dataset format. The default matches the Relit-LiVE directory structure in this repository.
--drop_mr flag False Ignore metallic and roughness conditioning.
--use_rotate_light flag False Enable dynamic light rotation mode.
--use_fixed_frame_and_w_rotate_light flag False Keep the first frame fixed and rotate lighting along the environment-map width axis.
--use_fixed_frame_and_h_rotate_light flag False Keep the first frame fixed and rotate lighting along the environment-map height axis.
--h_rotate_light int 0 Apply vertical environment-map rotation to each frame, in degrees.
--w_rotate_light int 0 Apply horizontal environment-map rotation to each frame, in pixels.
--num_frames int 81 Number of output frames. When set to 1, the script saves a png; otherwise it saves an mp4.
--num_inference_steps int 50 Number of denoising inference steps.
--frame_interval int 1 Sampling interval when reading the input video or image sequence.
--height int 480 Output height.
--width int 832 Output width.
--ckpt_path str None Path to the checkpoint to load.
--output_dir str ./results Default output directory.
--output_path str None Explicit output file path. Only .mp4 and .png are supported.
--dataloader_num_workers int 1 Number of DataLoader workers.
--cfg_scale float 5.0 Classifier-free guidance scale.
--wo_ref_weight float 0.0 Weight for the branch without reference-image conditioning.
--quality int 5 Video quality value passed to imageio when saving mp4 files.

Notes

  • Output filenames automatically include parts of the checkpoint name, sequence name, resolution, reference-image mode, environment lighting information, inference steps, frame count, and cfg_scale.
  • When --num_frames 1 is used, the script writes a png. When --num_frames > 1, it writes an mp4.

πŸš€ Full inference pipeline (gradio)

Please make sure you have the following items ready:

  1. conda environment named diffsynth.
  2. conda environment named cosmos-predict1.
  3. ./checkpoints/*.ckpt.
  4. ./third_party/cosmos-transfer1-diffusion-renderer.
  5. ./third_party/cosmos-transfer1-diffusion-renderer/checkpoints/Cosmos-Tokenize1-CV8x8x8-720p and ./third_party/cosmos-transfer1-diffusion-renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B.

Then,

conda activate diffsynth
python run_full_inference_gradio.py

πŸ“Œ Future plans

This project will be continuously maintained. We welcome users to try it out and share their feedback (15770575681@163.com).

The current plan includes a model version specifically designed for portraits and another that is better suited for handling motion (including camera and scene dynamics).

🀝 Citation

If you find this repository helpful, please consider citing our paper:

@article{xiao2026relit,
  title={Relit-LiVE: Relight Video by Jointly Learning Environment Video},
  author={Xiao, Weiqing and Li, Hong and Yang, Xiuyu and Chen, Houyuan and Li, Wenyi and Liu, Tianqi and Xu, Shaocong and Ye, Chongjie and Zhao, Hao and Wang, Beibei},
  journal={arXiv preprint arXiv:2605.06658},
  year={2026}
}

πŸ“ Acknowledgements

Code is built on DiffSynth-Studio and diffusion-renderer. Thanks all the authors for their excellent contributions!

About

[SIGGRAPH 2026] Official code of the paper "Relit-LiVE: Relight Video by Jointly Learning Environment Video".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors