Auditing Language Model Unlearning via Information Decomposition

Code for the EACL 2026 paper: Auditing Language Model Unlearning via Information Decomposition.

We introduce a framework to audit machine unlearning by decomposing the information in model representations into Unlearned Knowledge (unique to base model) and Residual Knowledge (shared/redundant).

Features

RINE Implementation: Estimates Redundant Information using Lagrangian optimization over neural probes.
PID Metrics: Calculates Unlearned ($I^B_{uniq}$) and Residual ($I_{\cap}$) knowledge.
Risk Score: Implements the inference-time abstention mechanism described in Section 6.

File structure

eacl2026-auditing-unlearning/
├── README.md
├── requirements.txt
├── src/
│   ├── rine.py           # Core logic: RINE implementation (Section 5.1 & App A.3)
│   └── probe_all_layers.py        # Linear probing script (Section 3)
├── run_audit.py   # Main entry point to run the audit

Setup and Installation

We use the open-unlearning framework to train and use the base and unlearned models.

conda create -n audit_unlearning python=3.10
conda activate audit_unlearning
pip install -r requirements.txt

Usage

Probing representations

Results from Section 3 can be reproduced by running python src/probe_all_layers.py.

This script will extarct activations from the base and unlearned model, train logistic probes on them and save the AUROC performance of the probes at each layer.

Run the audit

To audit a model (e.g., Llama-3-8b unlearned via RMU) against its base version:

python audit_unlearning.py \
    --base_model "<path to your base model>" \
    --unlearned_model "<path to your unlearned model>" \
    --dataset_name "locuslab/TOFU" \
    --forget_subset "forget10" \
    --retain_subset "retain90" \
    --exp_name "<experiment_identifier>" \
    --cache_dir "./activations"

The script can be used to audit unlearned models as well as compute the risk scores from our paper.

Cite

Please use the following citation:

@inproceedings{goel2025auditing,
  title={Auditing Language Model Unlearning via Information Decomposition},
  author={Goel, Anmol and Ritter, Alan and Gurevych, Iryna},
  booktitle={Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL)},
  year={2026}
}

Disclaimer

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
open-unlearning @ d3d7cf9		open-unlearning @ d3d7cf9
src		src
static		static
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
NOTICE.txt		NOTICE.txt
README.md		README.md
index.html		index.html
requirements.txt		requirements.txt
run_audit.py		run_audit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Auditing Language Model Unlearning via Information Decomposition

Features

File structure

Setup and Installation

Usage

Probing representations

Run the audit

Cite

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Auditing Language Model Unlearning via Information Decomposition

Features

File structure

Setup and Installation

Usage

Probing representations

Run the audit

Cite

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages