Hi,
Is it possible to use the gap statistic as is when considering distances based on cosine similarity instead of euclidean L2?
Since cosine similarity is high for low euclidean distances, I wonder if the formulae need to be modified in such a case.
When I try to do so, I get a negative gap statistic value that increases with increasing number of clusters. Is that expected?
Thanks.
Hi,
Is it possible to use the gap statistic as is when considering distances based on cosine similarity instead of euclidean L2?
Since cosine similarity is high for low euclidean distances, I wonder if the formulae need to be modified in such a case.
When I try to do so, I get a negative gap statistic value that increases with increasing number of clusters. Is that expected?
Thanks.