Summary
BaggingPuClassifier is not recognised as a classifier by scikit-learn ≥ 1.6's
tag-based estimator introspection system. This causes sklearn.inspection.partial_dependence
(and any other sklearn utility that calls is_classifier()) to raise:
ValueError: 'estimator' must be a fitted regressor or classifier.
Environment
pulearn: 0.1.1
scikit-learn: 1.6+ (confirmed on 1.7.1)
- Python: 3.13
Steps to reproduce
import joblib
import numpy as np
from sklearn.base import is_classifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler
from sklearn.svm import SVC
from pulearn import BaggingPuClassifier
svc = SVC(kernel="rbf", probability=True, random_state=0)
clf = BaggingPuClassifier(estimator=svc, n_estimators=10, random_state=0)
pipeline = Pipeline([("scaler", RobustScaler()), ("classifier", clf)])
X = np.random.default_rng(0).standard_normal((100, 4))
y = np.array([1] * 20 + [0] * 80)
pipeline.fit(X, y)
print(is_classifier(pipeline)) # False — expected True
print(is_classifier(clf)) # False — expected True
from sklearn.inspection import partial_dependence
# Raises: ValueError: 'estimator' must be a fitted regressor or classifier.
partial_dependence(pipeline, X=X, features=0)
Root cause
In scikit-learn 1.6, is_classifier() was changed from:
# sklearn < 1.6
return getattr(estimator, "_estimator_type", None) == "classifier"
to:
# sklearn >= 1.6
return get_tags(estimator).estimator_type == "classifier"
get_tags() calls estimator.__sklearn_tags__(). The __sklearn_tags__ method
is defined on BaseEstimator and on each Mixin (ClassifierMixin,
RegressorMixin, etc.), and they are designed to chain via super().
BaggingPuClassifier's MRO is:
BaggingPuClassifier → BaseBaggingPU → BaseEnsemble → ... → BaseEstimator → ClassifierMixin
Because BaseEstimator appears before ClassifierMixin in the MRO,
BaseEstimator.__sklearn_tags__() does not call super().__sklearn_tags__(),
so ClassifierMixin.__sklearn_tags__() — which sets estimator_type = "classifier" —
is never reached. The result is Tags(estimator_type=None, ...), so is_classifier
returns False.
The legacy _estimator_type = "classifier" class attribute set by ClassifierMixin
is still present, but sklearn ≥ 1.6 no longer uses it (it is marked
# TODO(1.8): Remove this attribute in the sklearn source).
The same issue propagates through sklearn.pipeline.Pipeline, which delegates its
own tags to its final step. So wrapping BaggingPuClassifier in a Pipeline is
also affected.
Fix
ClassifierMixin must appear to the left of BaseEstimator in the class
definition, or BaggingPuClassifier (and BaseBaggingPU) must explicitly
implement __sklearn_tags__. The minimal fix is:
# In pulearn/bagging.py
from sklearn.utils._tags import ClassifierTags
class BaggingPuClassifier(BaseBaggingPU, ClassifierMixin):
def __sklearn_tags__(self):
tags = super().__sklearn_tags__()
tags.estimator_type = "classifier"
tags.classifier_tags = ClassifierTags()
tags.target_tags.required = True
return tags
Alternatively, reordering the base classes of BaseBaggingPU so that
ClassifierMixin appears before BaseEstimator in the MRO would also resolve
this, but the explicit __sklearn_tags__ override is safer and more explicit.
Workaround (for users)
Until a fix is released, monkey-patching before calling any sklearn inspection
utility works:
from sklearn.utils._tags import ClassifierTags
from pulearn.bagging import BaggingPuClassifier
def _pu_sklearn_tags(self):
from sklearn.base import BaseEstimator
tags = BaseEstimator.__sklearn_tags__(self)
tags.estimator_type = "classifier"
tags.classifier_tags = ClassifierTags()
tags.target_tags.required = True
return tags
BaggingPuClassifier.__sklearn_tags__ = _pu_sklearn_tags
References
Summary
BaggingPuClassifieris not recognised as a classifier by scikit-learn ≥ 1.6'stag-based estimator introspection system. This causes
sklearn.inspection.partial_dependence(and any other sklearn utility that calls
is_classifier()) to raise:Environment
pulearn: 0.1.1scikit-learn: 1.6+ (confirmed on 1.7.1)Steps to reproduce
Root cause
In scikit-learn 1.6,
is_classifier()was changed from:to:
get_tags()callsestimator.__sklearn_tags__(). The__sklearn_tags__methodis defined on
BaseEstimatorand on each Mixin (ClassifierMixin,RegressorMixin, etc.), and they are designed to chain viasuper().BaggingPuClassifier's MRO is:Because
BaseEstimatorappears beforeClassifierMixinin the MRO,BaseEstimator.__sklearn_tags__()does not callsuper().__sklearn_tags__(),so
ClassifierMixin.__sklearn_tags__()— which setsestimator_type = "classifier"—is never reached. The result is
Tags(estimator_type=None, ...), sois_classifierreturns
False.The legacy
_estimator_type = "classifier"class attribute set byClassifierMixinis still present, but sklearn ≥ 1.6 no longer uses it (it is marked
# TODO(1.8): Remove this attributein the sklearn source).The same issue propagates through
sklearn.pipeline.Pipeline, which delegates itsown tags to its final step. So wrapping
BaggingPuClassifierin aPipelineisalso affected.
Fix
ClassifierMixinmust appear to the left ofBaseEstimatorin the classdefinition, or
BaggingPuClassifier(andBaseBaggingPU) must explicitlyimplement
__sklearn_tags__. The minimal fix is:Alternatively, reordering the base classes of
BaseBaggingPUso thatClassifierMixinappears beforeBaseEstimatorin the MRO would also resolvethis, but the explicit
__sklearn_tags__override is safer and more explicit.Workaround (for users)
Until a fix is released, monkey-patching before calling any sklearn inspection
utility works:
References
sklearn/utils/_tags.py,sklearn/base.py(is_classifier)_estimator_typeas the authoritative mechanism