I'm interested in using this library so I tested to see if the False Positive Rate is controlled within alpha that I set. However, the results are showing that the number of tests that the portion of tests rejected far exceeds alpha. For 100 trials, where each trial is comparing two Bernoulli proportions (of 0.01 conversion rate) and alpha=0.05, beta=0.10, and the total number of visitors is set to 100K, I get the following percent of reject and accept:
% rejected: 43
% accepted: 56.99999999999999
Here's the link to my code: https://gist.github.com/sjoelee/d4fed8b80e1af1d2e0cf7aac37d09a90. Once each visitor arrives and has been bucketed to treatment/control, I simulate a biased coin flip (based on conversion rate for variation) for their conversion. I add each individual data through addData and then look at the results to see if they finished, determining whether the result from addData returned true (accept null) or false (reject null). Could you provide more documentation on how the thresholds are calculated? And have any tests been done to see if A/A tests are still controlled under alpha? Thanks!
I'm interested in using this library so I tested to see if the False Positive Rate is controlled within alpha that I set. However, the results are showing that the number of tests that the portion of tests rejected far exceeds alpha. For 100 trials, where each trial is comparing two Bernoulli proportions (of 0.01 conversion rate) and alpha=0.05, beta=0.10, and the total number of visitors is set to 100K, I get the following percent of reject and accept:
Here's the link to my code: https://gist.github.com/sjoelee/d4fed8b80e1af1d2e0cf7aac37d09a90. Once each visitor arrives and has been bucketed to treatment/control, I simulate a biased coin flip (based on conversion rate for variation) for their conversion. I add each individual data through
addDataand then look at the results to see if they finished, determining whether the result fromaddDatareturned true (accept null) or false (reject null). Could you provide more documentation on how the thresholds are calculated? And have any tests been done to see if A/A tests are still controlled under alpha? Thanks!