Summary
Add automatic threshold recommendation support for drift detection.
Motivation
Choosing the right drift threshold can be difficult, especially for new users.
Auto-threshold recommendations would help users select reasonable thresholds based on dataset characteristics, column distributions, and observed variation.
This improves:
- beginner usability
- drift detection quality
- threshold tuning
- production validation workflows
Proposed Improvements
- Analyze dataset distributions
- Recommend numeric, categorical, and outlier thresholds
- Provide threshold suggestions without changing existing defaults automatically
- Include recommendation metadata in reports
Suggested Files
Potential implementation areas:
dift/core/stats_diff.py
dift/thresholds.py
dift/reports/models.py
dift/reports/console_report.py
dift/reports/json_report.py
docs/thresholds.md
Suggested Tasks
- Add threshold recommendation utility
- Add recommendation logic for numeric drift
- Add recommendation logic for categorical drift
- Add recommendation logic for outlier detection
- Add tests
- Update documentation
How to Test
Run:
Run targeted tests:
pytest tests/test_thresholds.py
pytest tests/test_stats_diff.py
Manual validation:
dift examples/old_drift.csv examples/new_drift.csv --key id
Verify:
- recommendations are generated where appropriate
- existing configured thresholds still work
- reports remain valid
- no existing threshold behavior breaks
Documentation Impact
Update:
docs/thresholds.md
docs/statistical-analysis.md
docs/reports.md
Documentation should include:
- what auto-threshold recommendations are
- when users should rely on them
- how recommendations are calculated at a high level
- limitations and caveats
Acceptance Criteria
- Auto-threshold recommendations are generated
- Existing threshold behavior remains backward compatible
- Tests pass
- Reports include recommendation information where appropriate
- Documentation updated
Summary
Add automatic threshold recommendation support for drift detection.
Motivation
Choosing the right drift threshold can be difficult, especially for new users.
Auto-threshold recommendations would help users select reasonable thresholds based on dataset characteristics, column distributions, and observed variation.
This improves:
Proposed Improvements
Suggested Files
Potential implementation areas:
Suggested Tasks
How to Test
Run:
pytest ruff check .Run targeted tests:
Manual validation:
Verify:
Documentation Impact
Update:
Documentation should include:
Acceptance Criteria