Skip to content

Fix continuous ROC for binary observations with out-of-range forecasts (#442)#450

Open
aaronspring wants to merge 1 commit into
mainfrom
fix-roc-continuous-binary-obs-442
Open

Fix continuous ROC for binary observations with out-of-range forecasts (#442)#450
aaronspring wants to merge 1 commit into
mainfrom
fix-roc-continuous-binary-obs-442

Conversation

@aaronspring

Copy link
Copy Markdown
Collaborator

Summary

Fixes #442. xss.roc(..., bin_edges="continuous") returned ROC AUC values that diverged sharply from sklearn.metrics.roc_auc_score — down to 0.0 where sklearn returns 1.0.

Root cause

In continuous mode the observations are binary (0/1), but the code categorised both observations and forecasts with the same forecast-derived threshold i:

dichotomous_category_edges = np.array([-np.inf, i, np.inf])
Contingency(observations, forecasts,
            dichotomous_category_edges,   # observation edges
            dichotomous_category_edges,   # forecast edges
            dim=dim)

The thresholds i are drawn from forecast magnitudes. When forecasts fall outside the [0, 1] range of the binary observations, the binary observations never land in the "event" category, so the hit rate is 0 at every threshold and the area collapses.

This was previously masked because typical probability forecasts (and the existing sklearn test, which clip(0, 1)s the forecast) stay within [0, 1]. The reproducer in the issue shifts forecasts above 1, exposing the bug. It is not a dimension-ordering problem — it reproduces in pure 1-D.

Fix

In continuous mode, categorise observations with a fixed split at 0.5 and vary only the forecast threshold. Output now matches sklearn for every point in the reproducer (error 0.0 across all 10 points), independent of forecast magnitude or input dim order.

Tests

  • Added test_roc_continuous_forecast_out_of_unit_range_against_sklearn (parametrized over drop_intermediate), using the multi-dimensional, transposed, out-of-range data from the issue and asserting per-point agreement with roc_auc_score. It fails on main and passes with this change.
  • Full test_probabilistic.py, test_contingency.py, and accessor suites pass (500 passed, 1 skipped); doctests pass.

🤖 Generated with Claude Code

In `roc(..., bin_edges="continuous")` the binary observations were
categorised with the forecast-derived threshold `i`. When forecast
values fall outside the [0, 1] range of the observations, no observation
ever lands in the event category, so the hit rate is always 0 and the
area collapses (e.g. 0.0 instead of 1.0).

Categorise observations with a fixed split at 0.5 in continuous mode,
varying only the forecast threshold. Results now match
`sklearn.metrics.roc_auc_score` regardless of forecast magnitude or
input dimension ordering.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.64%. Comparing base (eccc957) to head (d8c580b).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #450      +/-   ##
==========================================
+ Coverage   94.49%   94.64%   +0.14%     
==========================================
  Files          27       27              
  Lines        2836     2855      +19     
==========================================
+ Hits         2680     2702      +22     
+ Misses        156      153       -3     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@aaronspring

aaronspring commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

@paololucchino do you mind to review this PR fixing your issue #442 ?

@aaronspring

Copy link
Copy Markdown
Collaborator Author

Verification: is this a continuous-only bug, or does explicit bin_edges also need it?

I checked whether, on main (without this PR), passing explicit bin_edges instead of "continuous" would already match sklearn.metrics.roc_auc_score. Per the API docs, bin_edges "defines categorization boundaries for observations and forecasts" — i.e. the explicit path categorises both inputs with the same edges (hence the o = f.copy() doctest).

Result on main (no PR):

Case sklearn xs explicit bin_edges xs "continuous"
A) Issue #442 data, forecast out of [0,1] (point 9, fc∈[1.23,1.81]) 1.0000 0.0000 0.0000
B) Probability forecast within [0,1] 0.9843 0.9843 ✅ 0.9843 ✅

So on main, explicit bin_edges matches sklearn only when the forecast (and the edges) lie within the [0,1] range of the binary observations (case B). For the out-of-range forecasts in #442 the explicit path is broken in the exact same way as continuous (point 9 → 0.0 instead of 1.0).

Why: both paths feed the same bin_edges to the observation categorisation. The binary observations (0/1) only fall into the "event" category if an edge sits between 0 and 1. Once forecasts — and thus the derived/passed edges — exceed 1, no observation is ever an "event", the hit rate is 0 at every threshold, and the area collapses. The bug is therefore not unique to continuous; it stems from categorising binary observations against forecast-scale edges.

Implication for this PR (confirms the scope is right): with the fix, "continuous" matches sklearn for all 10 points (case A), and the explicit bin_edges path is intentionally left unchanged. That's correct: the explicit path is a documented generalised multi-category ROC where observations and forecasts share categories — forcing a fixed 0.5 observation split there would break its documented behaviour (e.g. the o = f.copy(), category_edges = np.linspace(-2, 2, 5) doctest). The sklearn-equivalent binary-observation ROC is precisely what "continuous" mode is for, which is where this PR applies the fix.

Bottom line: explicit bin_edges on main would have matched sklearn only for in-range forecasts; for #442's out-of-range case it would not, so the fix belongs in the continuous path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roc bug, possibly due to dimension names?

1 participant