Replies: 1 comment
-
|
@rtambone I had the same problem as you. Turns out you need to set the threshold manually. I think NannyML uses standard deviation by default, you may want to change this to constant threshold. Here's my code example: from nannyml.thresholds import ConstantThreshold
calc = nml.UnivariateDriftCalculator(
column_names = all_cols,
# treat_as_categorical = bin_cols,
continuous_methods = ['jensen_shannon'],
# categorical_methods = ['jensen_shannon'],
# chunk_number = 1,
thresholds = {
# Jensen-Shannon score is not the same as p-value
# 0 means identical data and 1 means very different
'jensen_shannon': ConstantThreshold(upper = 0.1)
}
) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I am trying to implement nannyml and monitoring, but I got a lot of false positive.
I run a full monitoring pipeline which computes data drift for all the features. My issues is that a lot of features have alerts on all chunks of the analysis period. However, if I look at the distribution they are very similar.
Here you can find distance metrics.

And here the KDE distribution for each chunk.

Finally, I also plotted the KDE for reference and analysis period without chunking, and, again, distributions seem to overlap.

I add some info:
I don't know if I have to change some parameters, like thresholds or methods. Or maybe my chunking strategy is not the best.
Thanks in advance for helping me.
Beta Was this translation helpful? Give feedback.
All reactions