Development

Setup

git clone https://github.com/ranafaraz/EnsembleKit.git
cd EnsembleKit
python -m venv .venv
# Linux/macOS:
source .venv/bin/activate
# Windows:
.venv\Scripts\activate

pip install -e ".[dev]"
pytest -q   # 76 tests, all should pass

Optional extras

pip install -e ".[sklearn]"   # enables scikit-learn AUROC cross-check tests

Running the benchmark

# Full table + RESULTS.md
python -m evals.harness

# Dissociation gate (used in CI)
python -m evals.gate

# Interactive CLI
ensemblekit compare --regime het_competence
ensemblekit compare --regime corrupted
ensemblekit diversity
ensemblekit regimes

Repository structure

EnsembleKit/
  ensemblekit/
    synthesis/      -- Bayes label generator, base learner factory (z_k = a_k * s + noise)
    combiners/      -- average.py, weighted.py, robust.py, full.py, single.py
    regimes/        -- homogeneous.py, het_competence.py, corrupted.py
    eval/           -- auroc.py, gate.py, diversity.py
    cli.py          -- ensemblekit CLI entry point
  evals/
    harness.py      -- writes evals/RESULTS.md
    gate.py         -- asserts the 2x2 dissociation
  tests/            -- 76 pytest tests
  docs/             -- ARCHITECTURE.md, DECISIONS.md, demo.gif
  .env.example
  Dockerfile
  pyproject.toml

How to add a new combiner

Create ensemblekit/combiners/my_combiner.py implementing combine(log_odds: np.ndarray, labels: np.ndarray, rng) -> np.ndarray that takes a (K, N) array of per-learner log-odds and returns a (N,) combined score.
Register the combiner in ensemblekit/combiners/__init__.py under a string key (e.g., "my_combiner").
Add tests in tests/test_combiners.py covering at least: output shape, that a perfect learner produces AUROC ~1.0 in homogeneous, and that the combiner is deterministic given the same RNG.
Run ensemblekit regimes to check it appears in the table with a non-degenerate AUROC.
Add the key to ENSEMBLEKIT_COMBINER accepted values in Configuration.

How to change the learner generation model

The learner generation model is z_k = a_k * s + noise_k. To change it:

Edit or subclass the LearnerFactory in ensemblekit/synthesis/learners.py.
The factory must expose: generate(s: np.ndarray, rng) -> np.ndarray returning (K, N) log-odds.
If you add a new regime, create ensemblekit/regimes/my_regime.py implementing build_learners(s, y, rng) -> (log_odds, competence_hint) where competence_hint is the signal available to the competence estimator.
Register in ensemblekit/regimes/__init__.py and add a description to regimes/descriptions.py.
Add tests that: the regime does not change the Bayes label y, the expected combiner fails, and the fixed-axis combiner passes.

CI

GitHub Actions runs pytest -q and python -m evals.gate on Python 3.10, 3.11, and 3.12. No secrets are required -- the benchmark is fully offline.

Code style

Format with black, lint with ruff (configured in pyproject.toml).
Type annotations are encouraged but not enforced.
All random state must flow through np.random.default_rng(seed) for reproducibility -- never use np.random.seed() or module-level state.
Keep combiners stateless: they receive all the data they need as arguments and return a score array. Side effects (logging, plotting) belong in the harness, not the combiner.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development

Development

Setup

Optional extras

Running the benchmark

Repository structure

How to add a new combiner

How to change the learner generation model

CI

Code style

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally