LossVal

Official implementation of LossVal - Efficient Data Valuation for Neural Networks.

Data Valuation is the process if assigning an importance score to each data point in a dataset. This importance score can be used to improve the performance of a machine learning model by focusing on the most important data points or for better explaining your model. LossVal is a novel method for data valuation that is based on the idea of optimizing the importance scores as weights that are part of the loss function. LossVal is efficient, scalable, and can be used with any differentiable loss function.

In our experiments, we show that LossVal achieves state-of-the-art performance on a range of data valuation tasks, without needing any additional training run.

Overview

In general, loss functions used with LossVal are of the form:

$$\text{LossVal} = \mathcal{L}_{w}(y, \hat{y}) \cdot \text{OT}_{w}(X_{train}, X_{val})^{2}$$

The model's prediction is denoted by $\hat{y}$, while $y$ represents the target values. The optimal transport distance $\text{OT}_{w}$ takes the features of the training data $X_{train}$ and validation data $X_{val}$ as input. For the target loss $\mathcal{L}_{w}$, we use instance-weighted formulations of existing loss functions, like a weighted cross-entropy loss or weighted mean-squared error loss (see below).

Weighted cross-entropy loss:

$$\text{CE}_{w} = - \sum^{N}_{n=1} \left[ w_{n} \cdot \sum^{K}_{k=1} y_{n,k} \log(\hat y_{n,k}) \right]$$

Weighted mean-squared error loss:

$$\text{MSE}_{w} = \sum^{N}_{n=1} w_{n} \cdot (y_{n} - \hat{y}_{n})^2$$

Weighted optimal transport distance:

$$\text{OT}_w(X_{train}, X_{val}) = \min_{\gamma \in \Pi(w, 1)} \sum_{n=1}^{N}\sum_{j=1}^{J} c(x_n, x_j) , \gamma_{n,j}$$

Use

You can find a basic reference implementation in src/lossval.py. Feel free to use this implementation as a starting point for your own experiments and modify to your needs.

All the data from the experiments can be found in the results folder.

Requirements and Installation

The code was implemented using Python 3.11 and Torch 2.1. For better compatibility, we recommend using the same versions. We found, that someof the libraries are not compatible with Python 3.12 or higher.

You can install all necessary packages using the requirements.txt file (pip install -r requirements.txt). Alternatively, you execute the following command to install the packages for the main experiments manually:

pip3 install torch torchvision torchaudio  # Keep in mind to use the correct torch version for your setup!
pip3 install numpy matplotlib pandas jupyter tqdm shap opendataval

Citation

If you use LossVal in your research, please cite our paper:

@misc{wibiral2024lossvalefficientdatavaluation,
      title={{L}oss{V}al: {E}fficient Data Valuation for Neural Networks}, 
      author={Tim Wibiral and Mohamed Karim Belaid and Maximilian Rabus and Ansgar Scherp},
      year={2024},
      eprint={2412.04158},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2412.04158}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
figures		figures
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LossVal

Overview

Use

Requirements and Installation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

twibiral/LossVal

Folders and files

Latest commit

History

Repository files navigation

LossVal

Overview

Use

Requirements and Installation

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages