Home

EnsembleKit

A benchmark for when ensembling helps and which combining trick buys which kind of robustness.

EnsembleKit synthesizes base-learner predictions as log-odds of a known Bayes label, then scores four combiners on how well they recover it. The result is a clean 2x2 dissociation: competence weighting and robust aggregation each buy robustness to a different failure mode, and you need both to be robust everywhere.

Heterogeneous learner competence (dead learners) breaks a uniform average; competence weighting fixes it.
Intermittent per-sample corruption breaks any fixed-weight combiner; robust aggregation (median) fixes it.
A diversity sweep shows ensemble gain collapses to zero as learner correlation rho approaches 1.
A scrambled-label null confirms every AUROC signal is real.
No models to train, no datasets, no API keys -- numpy only.

flowchart LR
    subgraph input["Base learner synthesis"]
        bayes["Known Bayes label\n(exact ground truth)"]
        learners["Base learners\nz_k = a_k * s + noise"]
    end
    subgraph combiners["Combiners = weighting x aggregation"]
        average["average\nuniform + mean"]
        weighted["weighted\ncompetence + mean"]
        robust["robust\nuniform + median"]
        full["full\ncompetence + median"]
    end
    subgraph eval["Evaluation"]
        auroc["AUROC vs Bayes label"]
        gate["Dissociation gate (CI)"]
    end
    bayes --> learners --> combiners --> auroc --> gate

Quick start

pip install -e ".[dev]"
ensemblekit compare --regime het_competence
ensemblekit compare --regime corrupted
ensemblekit diversity
python -m evals.harness
pytest -q

Wiki pages

Architecture -- log-odds formulation, base learner synthesis, combiner implementations, 2x2 design
Evaluation -- benchmark setup, results table, diversity sweep
Configuration -- env vars, .env.example
Development -- setup, tests, how to add a combiner or learner regime

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

EnsembleKit

Quick start

Wiki pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally