[MNT] Update scikit-learn upper bound#3250
Conversation
Thank you for contributing to
|
# Conflicts: # pyproject.toml
# Conflicts: # pyproject.toml
|
just looking at this this seems a matter of precision |
|
FAILED aeon/classification/dictionary_based/tests/test_weasel.py::test_weasel_v2_score - AssertionError: |
|
FAILED aeon/regression/sklearn/tests/test_rotation_forest_regressor.py::test_rotf_output - AssertionError: Mismatched elements: 15 / 15 (100%) |
|
I looked at the rotation forest regressor, and it looks like a fix in scikit that our testing case hits claude this time: _"From the 1.8 changelog: "Fixed a regression in decision trees where almost constant features were not handled properly" scikit-learn (#32259) Rotation Forest is a near-perfect trigger for #1 and #2. Each tree is trained on PCA-rotated features built from a bootstrap sample of only 3 columns at a time. It's extremely common for one of the three PCA components on a bootstrap subsample to capture almost no variance — i.e. a "near-constant feature." sklearn 1.7 split those columns one way; 1.8 splits them another (correctly). That cascades through every tree in the ensemble, so every prediction shifts a little — exactly the pattern you're seeing (all 15 elements differ, magnitude ~1–20% relative, no element wildly off).From the 1.8 changelog: "Fixed a regression in decision trees where almost constant features were not handled properly" scikit-learn (#32259) Rotation Forest is a near-perfect trigger for #1 and #2. Each tree is trained on PCA-rotated features built from a bootstrap sample of only 3 columns at a time. It's extremely common for one of the three PCA components on a bootstrap subsample to capture almost no variance — i.e. a "near-constant feature." sklearn 1.7 split those columns one way; 1.8 splits them another (correctly). That cascades through every tree in the ensemble, so every prediction shifts a little — exactly the pattern you're seeing (all 15 elements differ, magnitude ~1–20% relative, no element wildly off)."_ and chatgpt _The most likely upstream cause of the 1.7 -> 1.8 drift is scikit-learn PR #32259, which landed in 1.8. The 1.8 release notes describe it as fixing a regression where almost constant features were not handled properly in decision trees. The PR discussion is more explicit: after the tree refactor in PR #29458, FEATURE_THRESHOLD was accidentally initialised to 0.0 instead of 1e-7, and 1.8 fixes that. The constant is commented as being there to mitigate precision differences between 32-bit and 64-bit. That fits RotationForestRegressor unusually well, because aeon casts the rotated PCA features to float32 before fitting the tree. So a tree-side fix specifically about how near-constant features are treated under 32-bit vs 64-bit precision is exactly the sort of change that can move RotF predictions while leaving the high-level algorithm unchanged_ |
|
chatgpt thinks this es. These two failures are not the logistic-regression issue. WEASEL uses RidgeClassifierCV on its default path because support_probabilities=False by default, and WEASEL_V2 also fits a RidgeClassifierCV. So test_weasel_score and test_weasel_v2_score are both going through the same sklearn ridge-CV path, not the LogisticRegression(liblinear) path. The common upstream input is also important: both classifiers feed bag-of-words count features into that ridge classifier. In aeon’s SFA code, sparse bags are explicitly built as csr_matrix(..., dtype=np.uint32), and the dense bag constructors also allocate np.uint32 count arrays. WEASEL stacks those bags straight into RidgeClassifierCV, and WEASEL_V2 does the same after building dense SFA outputs with return_sparse=False. The likely source of the regression is the sklearn 1.8 ridge refactor. The release notes say RidgeCV, RidgeClassifier, and RidgeClassifierCV gained array-API support in 1.8. In the 1.8 ridge.py source, RidgeGCV.fit now records original_dtype = X.dtype, fits in a high-precision float dtype, and then casts intercept, dual_coef, and coef_ back to original_dtype at the end. That original_dtype capture and end-of-fit cast-back are not present in the 1.7.2 source. For WEASEL and WEASEL_V2, that means the ridge model is very likely being fit from uint32 count matrices and then having its learned parameters cast back to uint32. For a linear classifier, that is disastrous because the coefficients and intercept are supposed to be signed floating-point values. That neatly explains why both tests drop together under 1.8, despite using different transforms: the shared failure point is RidgeClassifierCV receiving integer-count bags and then storing integer-cast parameters. The fix in aeon should be simple: cast the bag matrix to float before calling RidgeClassifierCV.fit in both places. In practice, just before fitting: all_words = all_words.astype(np.float64, copy=False) for WEASEL, and words = words.astype(np.float64, copy=False) |
problem estimators: