You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What does this implement/fix? Explain your changes.
This PR fixes a logic error in ShapeletTransform where the binary Information Gain (IG) calculation was incorrectly evaluating split points between identical distance values.
Problem:
The _calc_binary_ig function iterated through every index in the sorted orderline to test split points. It failed to check if the current distance was equal to the next distance (orderline[i].distance == orderline[i+1].distance). This caused the algorithm to calculate splits inside groups of identical values, which is mathematically invalid and resulted in inflated IG scores (e.g., returning ~0.97 instead of ~0.42 for the reproduction case).
Solution:
Added a check to strictly skip invalid split points where the distance values are identical.
I have added the following labels to this PR based on the title: [ enhancement ].
I have added the following labels to this PR based on the changes made: [ transformations ]. Feel free to change these if they do not properly represent the PR.
The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.
If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.
Don't hesitate to ask questions on the aeonSlack channel if you have any.
PR CI actions
These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.
Run pre-commit checks for all files
Run mypy typecheck tests
Run all pytest tests and configurations
Run all notebook example tests
Run numba-disabled codecov tests
Stop automatic pre-commit fixes (always disabled for drafts)
This is less of a fix and more of a change to the algorithm itself. I would want to see it evaluated on benchmark datasets to see how it changes results before going forward. Either way skipping the last IG check does not seem correct.
hi, I looked at this and I dont really think the change is necessary. Its a design decision, and it as no impact on performance when I tested it, so I think atm I go by if it aint broke, dont fix it. I'll convert to draft, but if you can show it adds value, happy to switch back
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
enhancementNew feature, improvement request or other non-bug code enhancementtransformationsTransformations package
3 participants
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
Fixes #1322
What does this implement/fix? Explain your changes.
This PR fixes a logic error in
ShapeletTransformwhere the binary Information Gain (IG) calculation was incorrectly evaluating split points between identical distance values.Problem:
The
_calc_binary_igfunction iterated through every index in the sorted orderline to test split points. It failed to check if the current distance was equal to the next distance (orderline[i].distance == orderline[i+1].distance). This caused the algorithm to calculate splits inside groups of identical values, which is mathematically invalid and resulted in inflated IG scores (e.g., returning ~0.97 instead of ~0.42 for the reproduction case).Solution:
Added a check to strictly skip invalid split points where the distance values are identical.