Data drift false positive #447

rtambone · 2025-07-18T11:00:26Z

rtambone
Jul 18, 2025

Hi,

I am trying to implement nannyml and monitoring, but I got a lot of false positive.

I run a full monitoring pipeline which computes data drift for all the features. My issues is that a lot of features have alerts on all chunks of the analysis period. However, if I look at the distribution they are very similar.

Here you can find distance metrics.

And here the KDE distribution for each chunk.

Finally, I also plotted the KDE for reference and analysis period without chunking, and, again, distributions seem to overlap.

I add some info:

datasets are about 5/6M rows
I used 10 chunk, but same figure with 100 chunks
I used a strategy to drop records

I don't know if I have to change some parameters, like thresholds or methods. Or maybe my chunking strategy is not the best.

Thanks in advance for helping me.

AndhikaWB · 2025-08-03T23:27:48Z

AndhikaWB
Aug 3, 2025

@rtambone I had the same problem as you. Turns out you need to set the threshold manually. I think NannyML uses standard deviation by default, you may want to change this to constant threshold.

Here's my code example:

from nannyml.thresholds import ConstantThreshold

calc = nml.UnivariateDriftCalculator(
    column_names = all_cols,
    # treat_as_categorical = bin_cols,
    continuous_methods = ['jensen_shannon'],
    # categorical_methods = ['jensen_shannon'],
    # chunk_number = 1,
    thresholds = {
        # Jensen-Shannon score is not the same as p-value
        # 0 means identical data and 1 means very different
        'jensen_shannon': ConstantThreshold(upper = 0.1)
    }
)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data drift false positive #447

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Data drift false positive #447

Uh oh!

rtambone Jul 18, 2025

Replies: 1 comment

Uh oh!

AndhikaWB Aug 3, 2025

rtambone
Jul 18, 2025

AndhikaWB
Aug 3, 2025