[MNT] Update `scikit-learn` upper bound by MatthewMiddlehurst · Pull Request #3250 · aeon-toolkit/aeon

MatthewMiddlehurst · 2026-01-19T13:31:32Z

problem estimators:

rotation_forest_regressor: I've regenerated results as the changes are genuine (see below)
weasel. seems to be a typing issue with the words used

aeon-actions-bot · 2026-01-19T13:31:56Z

Thank you for contributing to `aeon`

I have added the following labels to this PR based on the title: [ maintenance ].

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Discord channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

Run pre-commit checks for all files
Run mypy typecheck tests
Run all pytest tests and configurations
Run all notebook example tests
Run numba-disabled codecov tests
Stop automatic pre-commit fixes (always disabled for drafts)
Disable numba cache loading
Regenerate expected results for testing
Push an empty commit to re-run CI checks

# Conflicts: # pyproject.toml

TonyBagnall · 2026-04-09T16:26:33Z

just looking at this

Catch22Regressor,

this seems a matter of precision
Expected:
array([0.638218, 1.090666, 0.583235, 1.575507, 0.484134,
0.709761, 1.332061, 1.099275, 1.516734, 0.316833])
Got:
array([0.63821896, 1.0906666 , 0.58323551, 1.57550709, 0.48413489,
0.70976176, 1.33206165, 1.09927538, 1.51673405, 0.31683308])

TonyBagnall · 2026-04-09T16:27:47Z

FAILED aeon/classification/dictionary_based/tests/test_weasel.py::test_weasel_v2_score - AssertionError:
Arrays are not almost equal to 4 decimals
ACTUAL: 0.5454545454545454
DESIRED: 0.90909
FAILED aeon/classification/dictionary_based/tests/test_weasel.py::test_weasel_score - AssertionError:
Arrays are not almost equal to 4 decimals
ACTUAL: 0.5454545454545454
DESIRED: 0.727272

TonyBagnall · 2026-04-09T16:28:44Z

FAILED aeon/regression/sklearn/tests/test_rotation_forest_regressor.py::test_rotf_output - AssertionError:
Arrays are not almost equal to 4 decimals

Mismatched elements: 15 / 15 (100%)
Max absolute difference among violations: 0.0045145
Max relative difference among violations: 0.1914099
ACTUAL: array([0.026 , 0.0245, 0.0224, 0.0453, 0.0892, 0.0314, 0.026 , 0.0451,
0.0287, 0.04 , 0.026 , 0.0378, 0.0265, 0.0356, 0.0281])
DESIRED: array([0.0269, 0.0269, 0.02 , 0.0428, 0.0903, 0.0271, 0.0255, 0.0408,
0.029 , 0.0425, 0.0269, 0.0367, 0.0236, 0.0344, 0.0236])

TonyBagnall · 2026-04-09T18:04:40Z

I looked at the rotation forest regressor, and it looks like a fix in scikit that our testing case hits

claude this time:

_"From the 1.8 changelog:

"Fixed a regression in decision trees where almost constant features were not handled properly" scikit-learn (#32259)
"Fixed splitting logic during training in tree.DecisionTree* (and consequently in ensemble.RandomForest*) for nodes containing near-constant feature values and missing values" scikit-learn
"Fix decision tree splitting with missing values present in some features. In some cases the last non-missing sample would not be partitioned correctly" scikit-learn (#32351)

Rotation Forest is a near-perfect trigger for #1 and #2. Each tree is trained on PCA-rotated features built from a bootstrap sample of only 3 columns at a time. It's extremely common for one of the three PCA components on a bootstrap subsample to capture almost no variance — i.e. a "near-constant feature." sklearn 1.7 split those columns one way; 1.8 splits them another (correctly). That cascades through every tree in the ensemble, so every prediction shifts a little — exactly the pattern you're seeing (all 15 elements differ, magnitude ~1–20% relative, no element wildly off).From the 1.8 changelog:

"Fixed a regression in decision trees where almost constant features were not handled properly" scikit-learn (#32259)
"Fixed splitting logic during training in tree.DecisionTree* (and consequently in ensemble.RandomForest*) for nodes containing near-constant feature values and missing values" scikit-learn
"Fix decision tree splitting with missing values present in some features. In some cases the last non-missing sample would not be partitioned correctly" scikit-learn (#32351)

Rotation Forest is a near-perfect trigger for #1 and #2. Each tree is trained on PCA-rotated features built from a bootstrap sample of only 3 columns at a time. It's extremely common for one of the three PCA components on a bootstrap subsample to capture almost no variance — i.e. a "near-constant feature." sklearn 1.7 split those columns one way; 1.8 splits them another (correctly). That cascades through every tree in the ensemble, so every prediction shifts a little — exactly the pattern you're seeing (all 15 elements differ, magnitude ~1–20% relative, no element wildly off)."_

and chatgpt

_The most likely upstream cause of the 1.7 -> 1.8 drift is scikit-learn PR #32259, which landed in 1.8. The 1.8 release notes describe it as fixing a regression where almost constant features were not handled properly in decision trees. The PR discussion is more explicit: after the tree refactor in PR #29458, FEATURE_THRESHOLD was accidentally initialised to 0.0 instead of 1e-7, and 1.8 fixes that. The constant is commented as being there to mitigate precision differences between 32-bit and 64-bit.

That fits RotationForestRegressor unusually well, because aeon casts the rotated PCA features to float32 before fitting the tree. So a tree-side fix specifically about how near-constant features are treated under 32-bit vs 64-bit precision is exactly the sort of change that can move RotF predictions while leaving the high-level algorithm unchanged_

TonyBagnall · 2026-04-09T20:57:17Z

chatgpt thinks this

es. These two failures are not the logistic-regression issue.

WEASEL uses RidgeClassifierCV on its default path because support_probabilities=False by default, and WEASEL_V2 also fits a RidgeClassifierCV. So test_weasel_score and test_weasel_v2_score are both going through the same sklearn ridge-CV path, not the LogisticRegression(liblinear) path.

The common upstream input is also important: both classifiers feed bag-of-words count features into that ridge classifier. In aeon’s SFA code, sparse bags are explicitly built as csr_matrix(..., dtype=np.uint32), and the dense bag constructors also allocate np.uint32 count arrays. WEASEL stacks those bags straight into RidgeClassifierCV, and WEASEL_V2 does the same after building dense SFA outputs with return_sparse=False.

The likely source of the regression is the sklearn 1.8 ridge refactor. The release notes say RidgeCV, RidgeClassifier, and RidgeClassifierCV gained array-API support in 1.8. In the 1.8 ridge.py source, RidgeGCV.fit now records original_dtype = X.dtype, fits in a high-precision float dtype, and then casts intercept, dual_coef, and coef_ back to original_dtype at the end. That original_dtype capture and end-of-fit cast-back are not present in the 1.7.2 source.

For WEASEL and WEASEL_V2, that means the ridge model is very likely being fit from uint32 count matrices and then having its learned parameters cast back to uint32. For a linear classifier, that is disastrous because the coefficients and intercept are supposed to be signed floating-point values. That neatly explains why both tests drop together under 1.8, despite using different transforms: the shared failure point is RidgeClassifierCV receiving integer-count bags and then storing integer-cast parameters.

The fix in aeon should be simple: cast the bag matrix to float before calling RidgeClassifierCV.fit in both places. In practice, just before fitting:

all_words = all_words.astype(np.float64, copy=False)

for WEASEL, and

words = words.astype(np.float64, copy=False)

sklearn bound

697dfb2

aeon-actions-bot bot added the maintenance Continuous integration, unit testing & package distribution label Jan 19, 2026

MatthewMiddlehurst and others added 8 commits February 10, 2026 16:36

Merge remote-tracking branch 'origin/main' into mm/sklearn

73f53ef

# Conflicts: # pyproject.toml

ruptures and comment

d539afa

Merge remote-tracking branch 'origin/main' into mm/sklearn

8c531da

Merge remote-tracking branch 'origin/main' into mm/sklearn

53da541

# Conflicts: # pyproject.toml

fixes

544efc3

write path

3b67707

Merge remote-tracking branch 'origin/main' into mm/sklearn

b4fdd66

Regenerated expected results for testing

123f90f

MatthewMiddlehurst added the full pytest actions Run the full pytest suite on a PR label Mar 24, 2026

patrickzib and others added 2 commits March 28, 2026 10:17

Merge branch 'main' into mm/sklearn

8c340d1

Merge branch 'main' into mm/sklearn

03a0bb4

TonyBagnall added 3 commits April 9, 2026 19:14

rotation forest expected results

3a2f087

catch22 doctest

732b46a

catch22 doctest

73a1384

TonyBagnall added 3 commits April 9, 2026 22:02

typing issues

bcc8c31

catch22 docstring

a1c504f

Merge branch 'main' into mm/sklearn

c894944

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MNT] Update `scikit-learn` upper bound#3250

[MNT] Update `scikit-learn` upper bound#3250
MatthewMiddlehurst wants to merge 17 commits intomainfrom
mm/sklearn

MatthewMiddlehurst commented Jan 19, 2026 •

edited by TonyBagnall

Loading

Uh oh!

aeon-actions-bot bot commented Jan 19, 2026 •

edited

Loading

Uh oh!

TonyBagnall commented Apr 9, 2026

Uh oh!

TonyBagnall commented Apr 9, 2026 •

edited

Loading

Uh oh!

TonyBagnall commented Apr 9, 2026

Uh oh!

TonyBagnall commented Apr 9, 2026

Uh oh!

TonyBagnall commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MatthewMiddlehurst commented Jan 19, 2026 • edited by TonyBagnall Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aeon-actions-bot bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Thank you for contributing to aeon

Uh oh!

TonyBagnall commented Apr 9, 2026

Uh oh!

TonyBagnall commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TonyBagnall commented Apr 9, 2026

Uh oh!

TonyBagnall commented Apr 9, 2026

Uh oh!

TonyBagnall commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MatthewMiddlehurst commented Jan 19, 2026 •

edited by TonyBagnall

Loading

aeon-actions-bot bot commented Jan 19, 2026 •

edited

Loading

Thank you for contributing to `aeon`

TonyBagnall commented Apr 9, 2026 •

edited

Loading