Skip to content

respailab/unlearning-or-concealment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unlearning or Concealment? A Critical Analysis and Evaluation Metrics for Unlearning in Diffusion Models

teaser


Unlearning or Concealment? A Critical Analysis and Evaluation Metrics for Unlearning in Diffusion Models

Aakash Sen Sharma, Niladri Sarkar, Vikram Chundawat, Ankur A Mali, Murari Mandal

We expose a significant vulnerability in diffusion model unlearning methods, where an attacker can reverse the supposed erasure of concepts during the inference process. Our approach leverages a novel Partial Diffusion Attack that operates across all layers of the model, successfully recovering forgotten concepts in an unsupervised and data-free manner. While our work currently focuses on the unlearning methods applied to Stable Diffusion 1.4, this limitation highlights the need for further research to generalize these findings to other models and versions.

Setup

To set up your python environment:

python3 -m venv environ
source ./environ/bin/activate
cd diffusers
pip install .

Purpose and Ethical Use

This code is shared for educational purposes and is not intended to be used for any harmful or malicious generation, such as the creation of misleading information, harmful content, or the impersonation of others.

Acknowledgement:

Our work is based on a diffusers fork by @bghira.

Citation

If you find this useful for your research, please cite the following:

@misc{sharma2024unlearningconcealmentcriticalanalysis,
  title={Unlearning or Concealment? A Critical Analysis and Evaluation Metrics for Unlearning in Diffusion Models}, 
  author={Aakash Sen Sharma and Niladri Sarkar and Vikram Chundawat and Ankur A Mali and Murari Mandal},
  year={2024},
  eprint={2409.05668},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2409.05668}
}

About

We expose a significant vulnerability in diffusion model unlearning methods, where an attacker can reverse the supposed erasure of concepts during the inference process. Our approach leverages a novel Partial Diffusion Attack that operates across all layers of the model.

Topics

Resources

License

Stars

Watchers

Forks

Contributors