[ENH] Add flexible quality measures to RandomShapeletTransform#3244
[ENH] Add flexible quality measures to RandomShapeletTransform#3244nimra06 wants to merge 3 commits intoaeon-toolkit:mainfrom
Conversation
- Add quality_measure parameter to support multiple quality measures - Implement f_statistic quality measure as alternative to information_gain - Create _quality_measures.py module for numba-optimized quality measures - Add comprehensive tests for new functionality - Maintain backward compatibility (information_gain remains default) - Addresses issue aeon-toolkit#186
Thank you for contributing to
|
| def test_shapelet_transform_default_quality_measure(): | ||
| """Test that default quality measure is information_gain.""" | ||
| t = RandomShapeletTransform(n_shapelet_samples=10, max_shapelets=5) | ||
| assert t.quality_measure == "information_gain" |
There was a problem hiding this comment.
this is not really necessary
|
|
||
| __maintainer__ = ["MatthewMiddlehurst"] |
There was a problem hiding this comment.
I feel like this should be in utils/numba and have its own tests
| n_jobs: int = 1, | ||
| parallel_backend=None, | ||
| random_state: int | None = None, | ||
| quality_measure: str = "information_gain", |
There was a problem hiding this comment.
could you put this above batch_size, remember to do the docstring entry as well
| # Validate quality_measure | ||
| valid_measures = ["information_gain", "f_statistic"] | ||
| if quality_measure not in valid_measures: | ||
| raise ValueError( | ||
| f"quality_measure must be one of {valid_measures}, " | ||
| f"got {quality_measure}" | ||
| ) | ||
| self.quality_measure = quality_measure | ||
|
|
There was a problem hiding this comment.
validation should be done in _fit or a function called from it
There was a problem hiding this comment.
Not a bad change but why?
| @staticmethod | ||
| @njit(fastmath=True, cache=True) | ||
| def _find_shapelet_quality_f_stat( | ||
| X, |
There was a problem hiding this comment.
is having a new function for each new quality measure necessary? Feel like this could be done better
Reference Issues/PRs
Addresses #186
What does this implement/fix? Explain your changes.
Adds a
quality_measureparameter toRandomShapeletTransformto support multiple quality measures for evaluating shapelets. Currently, the transform is hardcoded to use information gain. This PR makes it flexible while maintaining backward compatibility.Changes:
quality_measureparameter (default:"information_gain") toRandomShapeletTransform.__init__f_statisticquality measure as an alternative option_quality_measures.pymodule for numba-optimized quality measure functionsImplementation details:
@njitPerformance:
Benchmarked both quality measures on unit test and basic motions datasets. F-statistic performs similarly to information gain (slightly faster in most cases, within 1-2%). The default behavior shows no performance regression.
Does your contribution introduce a new dependency? If yes, which one?
No new dependencies.
Any other comments?
This addresses the flexibility requested in issue #186. Based on community feedback, F-statistic may have lower accuracy than information gain on some datasets, but it runs faster. The trade-off is documented in the parameter description, and users can now choose based on their needs.
The implementation follows existing patterns in aeon and maintains full backward compatibility - existing code will continue to work exactly as before.