With the #113 we only use results with stddev ignoring the ones without it. This is fine when all tests use multiple samples (or single ones) but in case of a combination we are leaving the other part aside, which we should address.
One way would be to use weight based on the number of checks per each category or we can make the weights configurable (remember, we also have non-primary results, perhaps we might want to aid that as well).
With the #113 we only use results with stddev ignoring the ones without it. This is fine when all tests use multiple samples (or single ones) but in case of a combination we are leaving the other part aside, which we should address.
One way would be to use weight based on the number of checks per each category or we can make the weights configurable (remember, we also have non-primary results, perhaps we might want to aid that as well).