Chenxi Xie1,2 | Yuhui Wu1,2 | Qiaosi Yi1,2 | Lei Zhang1,2
1The Hong Kong Polytechnic University, 2OPPO Research Institute
-
2026.6.13: The TVEdit project page and arXiv preprint are released.
-
2026.6.13: The inference code and TV-Edit model are available.
-
Release dataset.
-
Release training code.
## git clone this repository
git clone https://github.com/xiechenxi99/TVEdit.git
cd TVEdit
# create an environment
conda create -n TVEdit python=3.10
conda activate TVEdit
pip install --upgrade pip
pip install torch==2.5.0+cu121 torchvision==0.20.0+cu121 --index-url https://download.pytorch.org/whl/cu121
pip install transformers==4.52.4 pytorch-lightning==2.4.0 diffusers==0.35.1
-
Download the base model checkpoint: Qwen-Image-Edit.
-
Download the trained TV-Edit weights: TVEdit-Qwen-Image-Edit.
-
[Optional] TV-Edit supports existing trained acceleration LoRA for 4-step editing: Qwen-Image-Edit-4step.
-
Launch the Gradio demo:
python gradio_demo.pyAfter launching the Gradio demo, use the interface as follows:
-
Specify the directory of the pretrained editing model, e.g., Qwen-Image-Edit.
-
Specify the path to the downloaded TV-Edit weights.
-
[Optional] Specify the directory of the downloaded acceleration LoRA.
-
Click the Load Model button to initialize the models.
-
Upload the image to be edited.
-
Draw the desired point trajectories on the canvas to indicate the spatial control.
-
Enter the expected semantic change as the textual editing instruction.
-
Adjust the CFG scale and random seed. For inference without acceleration LoRA, we recommend CFG 2.5-3.5 with 50 steps. With acceleration LoRA, use CFG 1 with 4 steps.
-
Click the Run Editing button to generate the edited image.
@article{xie2026text-vision,
title={Text-Vision Co-Instructed Image Editing},
author={Xie, Chenxi and Wu, Yuhui and Yi, Qiaosi and Zhang, Lei},
journal={arXiv preprint arXiv:2606.16767},
year={2026},
}
This project is released under the Apache 2.0 license.
If you have any questions, please contact xiechenxi99@gmail.com.
