Official implementation of LossVal - Efficient Data Valuation for Neural Networks.
Data Valuation is the process if assigning an importance score to each data point in a dataset. This importance score can be used to improve the performance of a machine learning model by focusing on the most important data points or for better explaining your model. LossVal is a novel method for data valuation that is based on the idea of optimizing the importance scores as weights that are part of the loss function. LossVal is efficient, scalable, and can be used with any differentiable loss function.
In our experiments, we show that LossVal achieves state-of-the-art performance on a range of data valuation tasks, without needing any additional training run.
In general, loss functions used with LossVal are of the form:
The model's prediction is denoted by
Weighted cross-entropy loss:
Weighted mean-squared error loss:
Weighted optimal transport distance:
You can find a basic reference implementation in src/lossval.py. Feel free to use this implementation as a starting point for your own experiments and modify to your needs.
All the data from the experiments can be found in the results folder.
The code was implemented using Python 3.11 and Torch 2.1. For better compatibility, we recommend using the same versions. We found, that someof the libraries are not compatible with Python 3.12 or higher.
You can install all necessary packages using the requirements.txt file (pip install -r requirements.txt).
Alternatively, you execute the following command to install the packages for the main experiments manually:
pip3 install torch torchvision torchaudio # Keep in mind to use the correct torch version for your setup!
pip3 install numpy matplotlib pandas jupyter tqdm shap opendatavalIf you use LossVal in your research, please cite our paper:
@misc{wibiral2024lossvalefficientdatavaluation,
title={{L}oss{V}al: {E}fficient Data Valuation for Neural Networks},
author={Tim Wibiral and Mohamed Karim Belaid and Maximilian Rabus and Ansgar Scherp},
year={2024},
eprint={2412.04158},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2412.04158},
}
