Skip to content

feat: Add Auto-Threshold Recommendations #103

Description

@ReginaldErzoah

Summary

Add automatic threshold recommendation support for drift detection.


Motivation

Choosing the right drift threshold can be difficult, especially for new users.

Auto-threshold recommendations would help users select reasonable thresholds based on dataset characteristics, column distributions, and observed variation.

This improves:

  • beginner usability
  • drift detection quality
  • threshold tuning
  • production validation workflows

Proposed Improvements

  • Analyze dataset distributions
  • Recommend numeric, categorical, and outlier thresholds
  • Provide threshold suggestions without changing existing defaults automatically
  • Include recommendation metadata in reports

Suggested Files

Potential implementation areas:

dift/core/stats_diff.py
dift/thresholds.py
dift/reports/models.py
dift/reports/console_report.py
dift/reports/json_report.py
docs/thresholds.md

Suggested Tasks

  • Add threshold recommendation utility
  • Add recommendation logic for numeric drift
  • Add recommendation logic for categorical drift
  • Add recommendation logic for outlier detection
  • Add tests
  • Update documentation

How to Test

Run:

pytest
ruff check .

Run targeted tests:

pytest tests/test_thresholds.py
pytest tests/test_stats_diff.py

Manual validation:

dift examples/old_drift.csv examples/new_drift.csv --key id

Verify:

  • recommendations are generated where appropriate
  • existing configured thresholds still work
  • reports remain valid
  • no existing threshold behavior breaks

Documentation Impact

Update:

docs/thresholds.md
docs/statistical-analysis.md
docs/reports.md

Documentation should include:

  • what auto-threshold recommendations are
  • when users should rely on them
  • how recommendations are calculated at a high level
  • limitations and caveats

Acceptance Criteria

  • Auto-threshold recommendations are generated
  • Existing threshold behavior remains backward compatible
  • Tests pass
  • Reports include recommendation information where appropriate
  • Documentation updated

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions