use n_clusters instead of n_partitions to locate the partition index

Hello, Thank you for creating this repo.

When I used this repo, I found that this line below should be:

```py
partition_index = int(np.floor(index/n_clusters))
```

because the indexes of the clusters are continuous in a single partition, and you should get the partition_index by `cluster_index / n_cluster_in_one_partition`

Thanks. and should I post a merge-request?

https://github.com/ThinkBigAnalytics/pyspark-distributed-kmodes/blob/98b27d710380707983b3f57348b9255d5b33bb30/pyspark_kmodes/pyspark_kmodes.py#L314



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use n_clusters instead of n_partitions to locate the partition index #3

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

use n_clusters instead of n_partitions to locate the partition index #3

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions