Fix continuous ROC for binary observations with out-of-range forecasts (#442) by aaronspring · Pull Request #450 · xarray-contrib/xskillscore

aaronspring · 2026-06-11T11:38:57Z

Summary

Fixes #442. xss.roc(..., bin_edges="continuous") returned ROC AUC values that diverged sharply from sklearn.metrics.roc_auc_score — down to 0.0 where sklearn returns 1.0.

Root cause

In continuous mode the observations are binary (0/1), but the code categorised both observations and forecasts with the same forecast-derived threshold i:

dichotomous_category_edges = np.array([-np.inf, i, np.inf])
Contingency(observations, forecasts,
            dichotomous_category_edges,   # observation edges
            dichotomous_category_edges,   # forecast edges
            dim=dim)

The thresholds i are drawn from forecast magnitudes. When forecasts fall outside the [0, 1] range of the binary observations, the binary observations never land in the "event" category, so the hit rate is 0 at every threshold and the area collapses.

This was previously masked because typical probability forecasts (and the existing sklearn test, which clip(0, 1)s the forecast) stay within [0, 1]. The reproducer in the issue shifts forecasts above 1, exposing the bug. It is not a dimension-ordering problem — it reproduces in pure 1-D.

Fix

In continuous mode, categorise observations with a fixed split at 0.5 and vary only the forecast threshold. Output now matches sklearn for every point in the reproducer (error 0.0 across all 10 points), independent of forecast magnitude or input dim order.

Tests

Added test_roc_continuous_forecast_out_of_unit_range_against_sklearn (parametrized over drop_intermediate), using the multi-dimensional, transposed, out-of-range data from the issue and asserting per-point agreement with roc_auc_score. It fails on main and passes with this change.
Full test_probabilistic.py, test_contingency.py, and accessor suites pass (500 passed, 1 skipped); doctests pass.

🤖 Generated with Claude Code

In `roc(..., bin_edges="continuous")` the binary observations were categorised with the forecast-derived threshold `i`. When forecast values fall outside the [0, 1] range of the observations, no observation ever lands in the event category, so the hit rate is always 0 and the area collapses (e.g. 0.0 instead of 1.0). Categorise observations with a fixed split at 0.5 in continuous mode, varying only the forecast threshold. Results now match `sklearn.metrics.roc_auc_score` regardless of forecast magnitude or input dimension ordering. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

codecov · 2026-06-11T11:41:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.64%. Comparing base (eccc957) to head (d8c580b).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #450      +/-   ##
==========================================
+ Coverage   94.49%   94.64%   +0.14%     
==========================================
  Files          27       27              
  Lines        2836     2855      +19     
==========================================
+ Hits         2680     2702      +22     
+ Misses        156      153       -3

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

aaronspring · 2026-06-11T11:45:52Z

@paololucchino do you mind to review this PR fixing your issue #442 ?

aaronspring · 2026-06-11T12:04:56Z

Verification: is this a `continuous`-only bug, or does explicit `bin_edges` also need it?

I checked whether, on main (without this PR), passing explicit bin_edges instead of "continuous" would already match sklearn.metrics.roc_auc_score. Per the API docs, bin_edges "defines categorization boundaries for observations and forecasts" — i.e. the explicit path categorises both inputs with the same edges (hence the o = f.copy() doctest).

Result on main (no PR):

Case	sklearn	xs explicit `bin_edges`	xs `"continuous"`
A) Issue #442 data, forecast out of [0,1] (point 9, fc∈[1.23,1.81])	1.0000	0.0000 ❌	0.0000 ❌
B) Probability forecast within [0,1]	0.9843	0.9843 ✅	0.9843 ✅

So on main, explicit bin_edges matches sklearn only when the forecast (and the edges) lie within the [0,1] range of the binary observations (case B). For the out-of-range forecasts in #442 the explicit path is broken in the exact same way as continuous (point 9 → 0.0 instead of 1.0).

Why: both paths feed the same bin_edges to the observation categorisation. The binary observations (0/1) only fall into the "event" category if an edge sits between 0 and 1. Once forecasts — and thus the derived/passed edges — exceed 1, no observation is ever an "event", the hit rate is 0 at every threshold, and the area collapses. The bug is therefore not unique to continuous; it stems from categorising binary observations against forecast-scale edges.

Implication for this PR (confirms the scope is right): with the fix, "continuous" matches sklearn for all 10 points (case A), and the explicit bin_edges path is intentionally left unchanged. That's correct: the explicit path is a documented generalised multi-category ROC where observations and forecasts share categories — forcing a fixed 0.5 observation split there would break its documented behaviour (e.g. the o = f.copy(), category_edges = np.linspace(-2, 2, 5) doctest). The sklearn-equivalent binary-observation ROC is precisely what "continuous" mode is for, which is where this PR applies the fix.

Bottom line: explicit bin_edges on main would have matched sklearn only for in-range forecasts; for #442's out-of-range case it would not, so the fix belongs in the continuous path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix continuous ROC for binary observations with out-of-range forecasts (#442)#450

Fix continuous ROC for binary observations with out-of-range forecasts (#442)#450
aaronspring wants to merge 1 commit into
mainfrom
fix-roc-continuous-binary-obs-442

aaronspring commented Jun 11, 2026

Uh oh!

codecov Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

aaronspring commented Jun 11, 2026 •

edited

Loading

Uh oh!

aaronspring commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aaronspring commented Jun 11, 2026

Summary

Root cause

Fix

Tests

Uh oh!

codecov Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

aaronspring commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aaronspring commented Jun 11, 2026

Verification: is this a continuous-only bug, or does explicit bin_edges also need it?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented Jun 11, 2026 •

edited

Loading

aaronspring commented Jun 11, 2026 •

edited

Loading

Verification: is this a `continuous`-only bug, or does explicit `bin_edges` also need it?