Skip to content

HKUST-LongGroup/LISA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LISA: Likelihood Score Alignment for Visual-condition Controllable Generation

Yanghao Wang · Hongxu Chen · Jiazhen Liu · Zhenqi He · Rui Liu · Zhen Wang · Long Chen

Arxiv

Controllbale Image Generation

Compositional-condition Generation

Controllbale Video Generation



LISA can accelerate the training and bootstrap better controllable generation results on perceptual quality and condition fidelity.



0. Environment preparation

pip install -r requirements.txt

1. Training

We take the pose-guided image generation task as the example, you can change the dataset name for other tasks.

export SPLIT="val"
export DATASET_NAME="Luka-Wang/realsinglehumanpose"
export CONTROLNET_DIR="model_out/realsinglehumanpose/"

accelerate launch --config_file "./config.yml" \
 --main_process_port=23156 ./train_controlnet_lisa.py \
 --pretrained_model_name_or_path="Manojb/stable-diffusion-2-1-base" \
 --output_dir=$CONTROLNET_DIR \
 --dataset_name=$DATASET_NAME \
 --resolution=512 \
 --learning_rate=1e-5 \
 --validation_image "log_val/realsinglehumanpose/1.png" "log_val/realsinglehumanpose/2.png" \
 --validation_prompt "a photo of a woman in a purple tank top is rowing a boat" "a photo of a man in a boat holding a fishing rod" \
 --train_batch_size=8 \
 --gradient_accumulation_steps=4 \
 --max_train_steps=10000 \
 --gradient_checkpointing \
 --checkpointing_steps=500 \
 --validation_steps=500 \
 --dataloader_num_workers=32 \
 --weight_lambda=0.2 \
 --decoder_feature_source=down_5 \

2. Inference

export CONTROLNET_DIR="model_out/realsinglehumanpose/checkpoint-10000/controlnet"
python inference.py \
 --dataset_split=$SPLIT \
 --pretrained_model_name_or_path="Manojb/stable-diffusion-2-1-base" \
 --controlnet_model_name_or_path=$CONTROLNET_DIR \
 --dataset_name=$DATASET_NAME \
 --resolution=512 \
 --output_dir="${CONTROLNET_DIR}/outputs/${SPLIT}/" \

3. Evaluation

export CONTROLNET_DIR="model_out/realsinglehumanpose/checkpoint-10000/controlnet"

python ./eval_scripts/metrics_realpose.py \
 --dataset_split=$SPLIT \
 --controlnet_model_name_or_path=$CONTROLNET_DIR \
 --dataset_name=$DATASET_NAME \

TODO 🛠️

  • Controllable Image Gneration using SD2.1 run code
  • Controllable Image Gneration using SD3 run code
  • Controllable Video Gneration using SVD run code

BibTex

@misc{wang2026lisalikelihoodscorealignment,
      title={LISA: Likelihood Score Alignment for Visual-condition Controllable Generation}, 
      author={Yanghao Wang and Hongxu Chen and Jiazhen Liu and Zhenqi He and Rui Liu and Zhen Wang and Long Chen},
      year={2026},
      eprint={2606.27192},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2606.27192}, 
}

About

[arXiv 2026] The Pytorch Implementation of LISA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages