Yanghao Wang · Hongxu Chen · Jiazhen Liu · Zhenqi He · Rui Liu · Zhen Wang · Long Chen†
LISA can accelerate the training and bootstrap better controllable generation results on perceptual quality and condition fidelity.
pip install -r requirements.txt
We take the pose-guided image generation task as the example, you can change the dataset name for other tasks.
export SPLIT="val"
export DATASET_NAME="Luka-Wang/realsinglehumanpose"
export CONTROLNET_DIR="model_out/realsinglehumanpose/"
accelerate launch --config_file "./config.yml" \
--main_process_port=23156 ./train_controlnet_lisa.py \
--pretrained_model_name_or_path="Manojb/stable-diffusion-2-1-base" \
--output_dir=$CONTROLNET_DIR \
--dataset_name=$DATASET_NAME \
--resolution=512 \
--learning_rate=1e-5 \
--validation_image "log_val/realsinglehumanpose/1.png" "log_val/realsinglehumanpose/2.png" \
--validation_prompt "a photo of a woman in a purple tank top is rowing a boat" "a photo of a man in a boat holding a fishing rod" \
--train_batch_size=8 \
--gradient_accumulation_steps=4 \
--max_train_steps=10000 \
--gradient_checkpointing \
--checkpointing_steps=500 \
--validation_steps=500 \
--dataloader_num_workers=32 \
--weight_lambda=0.2 \
--decoder_feature_source=down_5 \
export CONTROLNET_DIR="model_out/realsinglehumanpose/checkpoint-10000/controlnet"
python inference.py \
--dataset_split=$SPLIT \
--pretrained_model_name_or_path="Manojb/stable-diffusion-2-1-base" \
--controlnet_model_name_or_path=$CONTROLNET_DIR \
--dataset_name=$DATASET_NAME \
--resolution=512 \
--output_dir="${CONTROLNET_DIR}/outputs/${SPLIT}/" \
export CONTROLNET_DIR="model_out/realsinglehumanpose/checkpoint-10000/controlnet"
python ./eval_scripts/metrics_realpose.py \
--dataset_split=$SPLIT \
--controlnet_model_name_or_path=$CONTROLNET_DIR \
--dataset_name=$DATASET_NAME \
- Controllable Image Gneration using SD2.1 run code
- Controllable Image Gneration using SD3 run code
- Controllable Video Gneration using SVD run code
@misc{wang2026lisalikelihoodscorealignment,
title={LISA: Likelihood Score Alignment for Visual-condition Controllable Generation},
author={Yanghao Wang and Hongxu Chen and Jiazhen Liu and Zhenqi He and Rui Liu and Zhen Wang and Long Chen},
year={2026},
eprint={2606.27192},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2606.27192},
}



