Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
afd8d19
add sslalm variance reduction
andrewklayk Jun 6, 2025
25859e7
add objective vr
andrewklayk Jun 10, 2025
0efd3c0
add constraint vr
andrewklayk Jun 11, 2025
f091dbf
Merge branch 'main' of https://github.com/humancompatible/train into …
andrewklayk Aug 7, 2025
a85aae0
Merge branch 'main' of https://github.com/humancompatible/train into …
andrewklayk Aug 7, 2025
2416ccf
refactor
andrewklayk Aug 7, 2025
2a605da
added demo notebooks
andrewklayk Aug 8, 2025
4a948eb
update api, fix bugs, add tutorial notebooks
andrewklayk Aug 11, 2025
939ff27
fix bug in notebooks
andrewklayk Aug 11, 2025
69210d5
fix bug in notebook
andrewklayk Aug 12, 2025
a8b21e8
Merge branch 'neurips' of https://github.com/andrewklayk/humancompati…
andrewklayk Aug 12, 2025
6319712
fix text in algorithm_demo
andrewklayk Aug 13, 2025
be86b38
better algoritm example, fix shuffle for cuda in sslalm
andrewklayk Aug 13, 2025
81fc31b
reworked package structure
andrewklayk Aug 18, 2025
321b006
further structure rework, modify readme
andrewklayk Aug 18, 2025
5f8dc5c
fix some notebooks
andrewklayk Aug 18, 2025
5feebed
rework structure
andrewklayk Aug 18, 2025
04c3a24
experiment updates
andrewklayk Aug 19, 2025
167801f
update torch api, add demo
andrewklayk Aug 22, 2025
3983080
update yaml
andrewklayk Aug 22, 2025
a0ccb0c
update constr yaml
andrewklayk Aug 22, 2025
f9492d8
update README to focus on torch api
andrewklayk Aug 22, 2025
72ea18c
Update README.md
andrewklayk Aug 23, 2025
11d8132
minor changes to notebooks & cleanup
andrewklayk Aug 23, 2025
844ede1
fix imports in example notebooks
andrewklayk Aug 23, 2025
51ad524
restructuring, fix notebooks
andrewklayk Aug 23, 2025
131b6ed
cosmetics
andrewklayk Aug 23, 2025
13662fd
more cosmetics
andrewklayk Aug 23, 2025
12defe9
fix bugs in torch algs
andrewklayk Aug 25, 2025
1b19acd
updates
andrewklayk Aug 25, 2025
c6feaea
clean up algorithms
andrewklayk Aug 26, 2025
2084b84
rework in progress
andrewklayk Aug 27, 2025
c82c31f
add cval numel check to step in torch.sslalm
andrewklayk Aug 29, 2025
aded1e5
restructure examples, add example with eq loss constraint
andrewklayk Aug 29, 2025
00b9cce
refactor experiments
andrewklayk Aug 29, 2025
180c655
update readme + cosmetics
andrewklayk Aug 29, 2025
d86ad60
add tests, fix bugs in algs
andrewklayk Sep 1, 2025
3f046cc
update pyproject.toml, dependencies
andrewklayk Sep 1, 2025
4610133
remove unused imports
andrewklayk Sep 1, 2025
47b2b3a
update loss example
andrewklayk Sep 1, 2025
9638242
update readme, restructure
andrewklayk Sep 2, 2025
5794fba
fix tests
andrewklayk Sep 2, 2025
ebc74cb
fix imports in benchmark, convert to local
andrewklayk Sep 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 80 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,79 @@
# Benchmarking Stochastic Approximation Algorithms for Fairness-Constrained Training of Deep Neural Networks
# humancompatible-train: a package for constrained machine learning

[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Setup](https://github.com/humancompatible/train/actions/workflows/setup.yml/badge.svg)](https://github.com/humancompatible/train/actions/workflows/setup.yml)

This repository provides a tool to compare stochastic-constrained stochastic optimization algorithms on a _fair learning_ task.
The toolkit implements algorithms for constrained training of neural networks based on PyTorch, and inspired by PyTorch's API.
<!-- , as well as a tool to compare stochastic-constrained stochastic optimization algorithms on a _fair learning_ task in the `experiments` folder. -->

## Table of Contents
1. [Basic installation instructions](#basic-installation-instructions)
2. [Reproducing the Benchmark](#reproducing-the-benchmark)
3. [Extending the benchmark](#extending-the-benchmark) <!-- 6. [Citing humancompatible/train](#Citing-humancompatible/train) -->
4. [License and terms of use](#license-and-terms-of-use)
5. [References](#references)
2. [Using the toolkit](#using-the-toolkit)
3. [Extending the toolkit](#extending-the-toolkit)
4. [Reproducing the Benchmark](#reproducing-the-benchmark)
5. [License and terms of use](#license-and-terms-of-use)
6. [References](#references)

Humancompatible/train is still under active development! If you find bugs or have feature
humancompatible-train is still under active development! If you find bugs or have feature
requests, please file a
[Github issue](https://github.com/humancompatible/train/issues).

## Basic installation instructions
## Installation

Use

```
pip install humancompatible-train
```

The only dependencies of this package are `numpy` and `torch`.

## Using the toolkit

The toolkit implements algorithms for constrained training of neural networks based on PyTorch.

The algorithms follow the `dual_step()` - `step()` framework: taking inspiration from PyTorch, the `double_step` does updates related to the dual parameters and prepares for the primal update (by, e.g., saving constraint gradients), and `step()` updates the primal parameters.

In general, your code using `humancompatible-train` would look something like this:

```
for inputs, labels in dataloader:
# inference
outputs = model(inputs)
# calculate constraints and grads
for constraint in constraints:
c_eval = constraint(outputs, labels)
c_eval.backwards(retain_grad=True)
# depending on optimizer, update dual parameters / save constraint gradient / both
optimizer.dual_step(c_eval)
optimizer.zero_grad()
# calculate objective
loss = criterion(outputs,labels)
loss.backwards()
optimizer.step()
optimizer.zero_grad()
```

Our idea is to
1. Deviate minimally from the usual PyTorch workflow
2. Make different stochastic-constrained stochastic optimization algorithms nearly interchangable in the code.

### Code examples

You are invited to check out our new API presented in notebooks in the `examples` folder.

*The legacy API used for the benchmark is presented in `examples/_old_/algorithm_demo.ipynb` and `examples/_old_/constraint_demo.ipynb`.*

## Extending the toolkit

### Adding new code

**To add a new algorithm**, you can subclass the PyTorch ```Optimizer``` class and proceed following the API guideline presented above.

## Reproducing the Benchmark

The code used in [our benchmark paper](https://arxiv.org/abs/2507.04033) is not migrated to the new API yet (WIP).

### Basic installation instructions
The code requires Python version ```3.11```.

1. Create a virtual environment
Expand All @@ -30,13 +88,19 @@ source fairbenchenv/bin/activate
python -m venv fairbenchenv
fairbenchenv\Scripts\activate.bat
```
2. Install from source (*as an editable package*).
2. Install from source.
```
git clone https://github.com/humancompatible/train.git
cd train
pip install -r requirements.txt
pip install .
```

If you wish to edit the code of the algorithms, install as an editable package:
```
pip install -e .
```

__Warning__: it is recommended to use Stochastic Ghost with the mkl-accelerated version of the scipy package with Stochastic Ghost; to install it, run

```pip install --force-reinstall -i https://software.repos.intel.com/python/pypi scipy```
Expand All @@ -48,8 +112,6 @@ after installing requirements.txt; otherwise, the algorithm will run slower. How
<!-- pip install folktables -->
<!-- ``` -->

## Reproducing the Benchmark

### Running the algorithms

The benchmark comprises the following algorithms:
Expand All @@ -74,7 +136,7 @@ The results will be saved to `experiments/utils/saved_models` and `experiments/u

This repository uses [Hydra](https://hydra.cc/) to manage parameters; see `experiments/conf` for configuration files.
* To change the parameters of the experiment, such as the number of runs for each algorithm, run time, the dataset used (*note: for now supports only Folktables*) - use `experiment.yaml`.
* To change the dataset settings - such as file location - or do dataset-specific adjustments, use `data/{dataset_name}.yaml`
* To change the dataset settings - such as file location - or do dataset-specific adjustments - such as the configuration of the protected attributes - use `data/{dataset_name}.yaml`
* To change algorithm hyperparameters, use `alg/{algorithm_name}.yaml`.
* To change constraint hyperparameters, use `constraint/{constraint_name}.yaml`

Expand All @@ -84,35 +146,12 @@ This repository uses [Hydra](https://hydra.cc/) to manage parameters; see `exper
### Producing plots
The plots and tables like the ones in the paper can be produced using the two notebooks. `experiments/algo_plots.ipynb` houses the convergence plots, and `experiments/model_plots.ipynb` - all the others.

## Extending the benchmark

**To add a new algorithm**, you can subclass the ```Algorithm``` class. Before you can run it, you will need to follow these steps:
1. In the `experiments/conf/alg` folder, add a `.yaml` file with `import_name: {ClassName}` (so the code knows which algorithm to import) and the desired keyword parameter values under `params`:

```
import_name: ClassName

params:
param_name_1: value
param_name_2: value
```

2. In `src/algorithms/__init__.py`, add `from .{filename} import {ClassName}` (so the code is able to import it).

Now you can run the algorithm by executing `python run_folktables.py data=folktables alg={yaml_file_name}`, or by changing the experiment config files.

**To add a different constraint formulation**, you can use the `FairnessConstraint` class by passing your callable function to the constructor as `fn`. If you use `run_folktables.py`, you can add a new constraint function by following the steps:

1. Add a `.yaml` file with `import_name: {FunctionName}`, along with the desired batch size and bound (*to be reworked for more generality*), to the `experiments/conf/constraint` folder
2. Import it in `src/constraints/__init__.py` as in step 2 above.

Now, to run the code with your constraint, use the `constraint` field in the main config.

## License and terms of use

Humancompatible/train is provided under the Apache 2.0 Licence.
humancompatible-train is provided under the Apache 2.0 Licence.

The package relies on the Folktables package, provided under MIT Licence.
The benchmark part of the package relies on the Folktables package, provided under MIT Licence.
It provides code to download data from the American Community Survey
(ACS) Public Use Microdata Sample (PUMS) files managed by the US Census Bureau.
The data itself is governed by the terms of use provided by the Census Bureau.
Expand All @@ -134,9 +173,9 @@ For more information, see https://www.census.gov/data/developers/about/terms-of-

## Future work

- Add support for fairness constraints with >=2 subgroups (limitation of the code, not of the algorithms)
- Add support to datasets besides Folktables
- Move towards a more PyTorch-like API for optimizers
- Add more algorithms
- Add more examples from different fields where constrained training of DNNs is employed
- Migrate the benchmark to the new API

## References

Expand Down Expand Up @@ -164,5 +203,4 @@ Facchinei & Kungurtsev (2023) Stochastic Approximation for Expectation Objective
Huang, Zhang & Alacaoglu (2025) Stochastic Smoothed Primal-Dual Algorithms for Nonconvex Optimization with Linear Inequality Constraints, arXiv.

<a id="4">[4]</a>
Huang & Lin (2023) Oracle Complexity of Single-Loop Switching Subgradient Methods for Non-Smooth Weakly Convex Functional Constrained Optimization, Curran Associates Inc..

Huang & Lin (2023) Oracle Complexity of Single-Loop Switching Subgradient Methods for Non-Smooth Weakly Convex Functional Constrained Optimization, Curran Associates Inc..
41 changes: 0 additions & 41 deletions constraint_demo.ipynb

This file was deleted.

Loading
Loading