Hello, Thank you for creating this repo.
When I used this repo, I found that this line below should be:
partition_index = int(np.floor(index/n_clusters))
because the indexes of the clusters are continuous in a single partition, and you should get the partition_index by cluster_index / n_cluster_in_one_partition
Thanks. and should I post a merge-request?
|
partition_index = int(np.floor(index/n_partitions)) |
Hello, Thank you for creating this repo.
When I used this repo, I found that this line below should be:
because the indexes of the clusters are continuous in a single partition, and you should get the partition_index by
cluster_index / n_cluster_in_one_partitionThanks. and should I post a merge-request?
pyspark-distributed-kmodes/pyspark_kmodes/pyspark_kmodes.py
Line 314 in 98b27d7