This repository contains the implementation of Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning.
Note
We used stable diffusion 1.5. in our experiments, but it has been deleted from huggingface (as of September 2, 2024).
- Proceedings of BMVC is available (link)
- Camera-ready version is released on arXiv
- Our paper has been accepted by BMVC2024 (accepted papers list)
The environments of our experiments are based on PyTorch1.13.1 (docker image)
Pull docker image.
docker pull pytorch/pytorch:1.13.1-cuda11.6-cudnn8-runtimeInstall other packages.
pip install -r requirements.txtSave the stable diffusion 1.5 pipeline (text encoder, tokenizer, scheduler, VAE, and U-Net).
python load_save.py --pipeline runwayml/stable-diffusion-v1-5 --save_dir models/sd-15in case stable diffusion 1.4
python load_save.py --pipeline CompVis/stable-diffusion-v1-4 --save_dir models/sd-14For uv users, we provide project.toml and uv.lock, but these files are generated by MacOS (We use uv 0.7.16).
uv syncStore the prepared images in a directory. Supported .png, jpg, and .jpeg
.
└── ds
└── church
├── church-01.jpg
├── church-02.png
├── church-03.jpg
└── church-04.jpeg
Run following command for training (erasing).
python train.py --concept "Eiffel Tower" --concept_type object --save eiffel-tower --data ds/church --local --text_encoder_path models/sd-14/text_encoder --diffusion_path models/sd-14 --epochs 4Erased models are stored like below.
.
└── eiffel-tower
├── epoch-0
├ ├── pytorch_model.bin
├ └── config.json
├── epoch-1
├ ├── pytorch_model.bin
├ └── config.json
├── epoch-2
├ ├── pytorch_model.bin
├ └── config.json
├── epoch-3
├ ├── pytorch_model.bin
├ └── config.json
├──loss.csv
└──loss.png
inference (PNDM Scheduler and 100 inference steps)
python infer.py "a photo of Eiffel Tower." eiffel-tower/epoch-3 --tokenizer_path models/sd-14/tokenizer --unet_path models/sd-14/unet --vae_path models/sd-14/vaeor
python infer.py "a photo of Eiffel Tower." eiffel-tower/epoch-3 --model_name CompVis/stable-diffusion-v1-4this command use the Stable Diffusion 1.4 except the text encoder.
Our paper can be cited as follows
@misc{fuchi2024erasing,
title={Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning},
author={Masane Fuchi and Tomohiro Takagi},
year={2024},
eprint={2405.07288},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
or
@inproceedings{Fuchi_2024_BMVC,
author = {Masane Fuchi and Tomohiro Takagi},
title = {Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year = {2024},
url = {https://papers.bmvc2024.org/0216.pdf}
}
This implementation is based on Textual Inversion using diffusers.
Baselines are as follows: