Summary
Add adaptive drift scoring that adjusts drift interpretation based on dataset characteristics.
Motivation
A fixed drift score may not work equally well across all datasets.
For example:
- small datasets may need more cautious scoring
- noisy datasets may need less aggressive scoring
- stable datasets may require stricter scoring
Adaptive drift scoring helps Dift provide smarter and more context-aware drift results.
Proposed Improvements
- Use dataset size and distribution characteristics to adjust drift scoring
- Improve score sensitivity for different dataset types
- Preserve existing scoring behavior as the baseline
- Add adaptive scoring metadata to reports
Suggested Files
Potential implementation areas:
dift/core/stats_diff.py
dift/core/risk.py
dift/reports/models.py
dift/reports/json_report.py
dift/reports/html_report.py
docs/thresholds.md
Suggested Tasks
- Define adaptive scoring rules
- Add adaptive scoring utility
- Integrate adaptive scoring into drift analysis
- Add report metadata
- Add tests for small, medium, and noisy datasets
- Update documentation
How to Test
Run:
Run targeted tests:
pytest tests/test_stats_diff.py
pytest tests/test_risk.py
Manual validation:
dift examples/old_drift.csv examples/new_drift.csv --key id --report json --output report.json
Verify:
- adaptive scores are generated
- existing risk levels remain stable unless intentionally changed
- JSON and HTML reports still render correctly
- tests cover different dataset sizes
Documentation Impact
Update:
docs/statistical-analysis.md
docs/thresholds.md
docs/reports.md
docs/developer/architecture.md
Documentation should include:
- what adaptive drift scoring means
- how it differs from fixed scoring
- when it is useful
- limitations and interpretation guidance
Acceptance Criteria
- Adaptive drift scoring is implemented
- Scoring remains backward compatible where possible
- Tests pass
- Reports expose adaptive scoring metadata
- Documentation updated
Summary
Add adaptive drift scoring that adjusts drift interpretation based on dataset characteristics.
Motivation
A fixed drift score may not work equally well across all datasets.
For example:
Adaptive drift scoring helps Dift provide smarter and more context-aware drift results.
Proposed Improvements
Suggested Files
Potential implementation areas:
Suggested Tasks
How to Test
Run:
pytest ruff check .Run targeted tests:
Manual validation:
Verify:
Documentation Impact
Update:
Documentation should include:
Acceptance Criteria