Contributing to pm-bench

Thanks for considering a contribution. The most common path is adding a leaderboard entry for a model you've trained — that's documented in full below. If you want to fix a bug, add a baseline, or wire a new dataset, the same pre-flight rules apply: tests pass, ruff is clean, the leaderboard is verified.

Setup

git clone https://github.com/erphq/pm-bench
cd pm-bench
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest -q

pytest -q should report all-green. If it doesn't, please open an issue before opening a PR — we want to know about a broken main.

Adding a leaderboard entry

This is the supported submission flow. Same five steps for every task.

1. Run your model

Generate predictions on the canonical split. Either pipe through pm-bench prefixes to get the truth file shape, or — if you trained on your own pipeline — produce a CSV that matches the format below.

The standard recipe with the bundled dataset:

pm-bench split synthetic-toy > split.json
pm-bench prefixes synthetic-toy --split split.json --out prefixes.csv \
    --task <next-event|remaining-time|outcome|bottleneck>
# ... your model produces predictions.csv against prefixes.csv ...

For conformance, the submission is a model JSON (DFG transitions), not per-prefix predictions. See pm_bench/conformance.py for the schema; pm-bench discover --baseline dfg is a worked example.

2. Predictions file format

Task	File	Columns
`next-event`	`predictions.csv[.gz]`	`case_id,prefix_idx,predictions` (predictions = `\|`-joined ranked list)
`remaining-time`	`predictions.csv[.gz]`	`case_id,prefix_idx,predicted_days`
`outcome`	`predictions.csv[.gz]`	`case_id,prefix_idx,score` (P(outcome=1))
`bottleneck`	`predictions.csv[.gz]`	`activity_a,activity_b,predicted_wait_seconds`
`conformance`	`model.json`	`{"transitions": [["a","b"], ...]}`

Predictions must cover every (case_id, prefix_idx) (or transition) present in the truth file. Missing rows fail the score command and make your submission un-rankable.

3. Score it locally

pm-bench score predictions.csv --prefixes prefixes.csv --task <task>

For conformance:

pm-bench score model.json --task conformance \
    --dataset synthetic-toy --split split.json

Record the resulting JSON — those are the numbers that go in the leaderboard entry.

4. Add the entry

Check your predictions in:

leaderboard/predictions/<task>/<dataset>/<your-model-name>.<csv.gz|json>

Append an entry to leaderboard/<task>/<dataset>.json:

{
  "model": "your-model-name",
  "version": "1.0.0",
  "predictions_path": "leaderboard/predictions/<task>/<dataset>/<your-model-name>.csv.gz",
  "code": "https://github.com/<you>/<repo>",
  "paper": "https://arxiv.org/abs/...",
  "score": { ...numbers from step 3... },
  "scored_at": "2026-MM-DDTHH:MM:SSZ",
  "notes": "What's interesting about this submission."
}

5. Verify and regenerate STANDINGS.md

Pre-flight on the file you just touched:

pm-bench validate leaderboard/<task>/<dataset>.json

That runs schema check + score rescore in one shot — exactly what CI will run on your PR. Then regenerate the standings doc:

pm-bench leaderboard --all --verify
pm-bench leaderboard --all --markdown > STANDINGS.md

All three must succeed before the PR can land — CI runs them. Open the PR with a one-line summary of the result and how it compares to the existing reference.

Pre-PR checklist

pytest -q passes locally
ruff check pm_bench tests is clean
pm-bench leaderboard --all --verify exits 0 (no drift)
STANDINGS.md is regenerated if you touched any leaderboard JSON
Git commit message describes the why, not just the what
PR title is short (~50 chars); details go in the body

Quantifying noise (for serious submissions)

Before claiming "X beats the baseline by N points," check the noise band:

python -m bench.seeds --n 30

If your gain is smaller than the baseline's std band on the same task, your number isn't a signal yet — keep iterating, or run more seeds and report a CI alongside the headline.

Code style

Pure CPython where possible; torch / scipy / pandas live behind a [ml] extra (not yet shipped — open an issue first if your model needs it inside pm_bench/).
Type hints on all public functions.
Comments explain why, not what. The code should already say what.
One feature per PR. Stacked PRs are fine if each can be reviewed in isolation.

Reporting bugs

Open an issue. Please include:

Python version (python --version)
pm-bench version (pm-bench --version)
The smallest command sequence that reproduces the issue
The full traceback if there is one

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to pm-bench

Setup

Adding a leaderboard entry

1. Run your model

2. Predictions file format

3. Score it locally

4. Add the entry

5. Verify and regenerate STANDINGS.md

Pre-PR checklist

Quantifying noise (for serious submissions)

Code style

Reporting bugs

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to pm-bench

Setup

Adding a leaderboard entry

1. Run your model

2. Predictions file format

3. Score it locally

4. Add the entry

5. Verify and regenerate STANDINGS.md

Pre-PR checklist

Quantifying noise (for serious submissions)

Code style

Reporting bugs