Auto-generated from leaderboard/<task>/<dataset>.json - regenerate with pm-bench leaderboard --all --markdown > STANDINGS.md.
NDCG@10 over per-transition wait times (higher is better)
| Model | NDCG@k | k | n_transitions |
|---|---|---|---|
mean-wait-ref |
0.9911 | 10 | 9 |
random-ref |
0.9434 | 10 | 9 |
DFG fitness × precision → F-score (higher is better)
| Model | F | Fitness | Precision | n_test | n_model |
|---|---|---|---|---|---|
dfg-ref |
1.0000 | 1.0000 | 1.0000 | 9 | 9 |
empty-ref |
0.0000 | 0.0000 | 0.0000 | 9 | 0 |
top1 / top3 accuracy
| Model | top1 | top3 | n |
|---|---|---|---|
markov-ref |
0.9304 | 1.0000 | 158 |
uniform-ref |
0.2025 | 0.2785 | 158 |
ROC AUC (higher is better)
| Model | AUC | n | n_pos |
|---|---|---|---|
prior-ref |
0.6319 | 158 | 45 |
global-ref |
0.5000 | 158 | 45 |
MAE in days (lower is better)
| Model | mae_days | n |
|---|---|---|
mean-ref |
1.3481 | 158 |
zero-ref |
2.7410 | 158 |