Configuration

Environment variables

Env var	Default	Options	Effect
`MLFORGE_PROTOCOL`	`nested`	`nested`, `pipeline`, `leaky`	Evaluation protocol
`MLFORGE_DATASET`	`highdim`	`highdim`, `lowdim`, `null`	Dataset regime
`MLFORGE_BACKEND`	`numpy`	`numpy`, `sklearn`	Estimator/CV implementation
`MLFORGE_N_FEATURES`	regime default	integer	Total number of features
`MLFORGE_N_SAMPLES`	regime default	integer	Training set size
`MLFORGE_SEED`	`0`	integer	Random seed for data generation

Backend options

Backend	Install	Description
`numpy` (default)	built-in	From-scratch estimators, scaler, feature selector, CV, pipeline — all implemented in numpy
`sklearn`	`pip install -e ".[sklearn]"`	Uses scikit-learn's `Pipeline`, `cross_val_score`, `StratifiedKFold`, `LogisticRegression`, `KNeighborsClassifier`, `GaussianNB`, `StandardScaler`, `SelectKBest` to cross-check that the from-scratch numbers reproduce the reference implementations

The sklearn backend is a cross-check only — it is never required to run the benchmark or CI.

Dataset regimes

Regime	`n_features`	`n_informative`	Labels	Purpose
`highdim`	50	5	signal + noise	Both effects bite; the main benchmark
`lowdim`	5	5	signal	All features informative; control — all protocols agree
`null`	50	0	pure random	No signal; leaky CV manufactures accuracy, nested CV refuses

.env.example

# Evaluation protocol: nested (honest) | pipeline (no leak) | leaky (common but wrong)
MLFORGE_PROTOCOL=nested

# Dataset regime: highdim | lowdim | null
MLFORGE_DATASET=highdim

# Estimator/CV backend: numpy (from-scratch) | sklearn (cross-check, requires [sklearn] extra)
MLFORGE_BACKEND=numpy

# Random seed for data generation (integer)
MLFORGE_SEED=0

# Override dataset dimensions (optional; uses regime defaults if not set)
# MLFORGE_N_FEATURES=50
# MLFORGE_N_SAMPLES=200

pip extras

pip install -e "."           # numpy core only (from-scratch everything)
pip install -e ".[dev]"      # + pytest, ruff, mypy
pip install -e ".[sklearn]"  # + scikit-learn (cross-check backend)

Docker

docker build -t mlforge .
docker run --rm mlforge                              # offline benchmark (all regimes)
docker run --rm mlforge mlforge compare --dataset highdim --seed 0
docker run --rm -e MLFORGE_DATASET=null mlforge mlforge compare

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration

Configuration

Environment variables

Backend options

Dataset regimes

.env.example

pip extras

Docker

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally