Skip to content

TypeError in check_for_empty_cluster #2

@kopytin

Description

@kopytin

Hello,

I am getting a TypeError in the current version of this module. Whether it appears or not depends on the number of clusters I request. On the same dataset, with 2 clusters requested I never see this error, with 4 clusters I see it sometimes, with 10 I always see it.

File "/usr/local/lib/python3.5/dist-packages/pyspark_kmodes/pyspark_kmodes.py", line 430, in fit
self.n_clusters,self.max_dist_iter)
File "/usr/local/lib/python3.5/dist-packages/pyspark_kmodes/pyspark_kmodes.py", line 271, in k_modes_partitioned
clusters = check_for_empty_cluster(clusters, rdd)
File "/usr/local/lib/python3.5/dist-packages/pyspark_kmodes/pyspark_kmodes.py", line 315, in check_for_empty_cluster
partition_sizes = cluster_sizes[n_clusters*(partition_index):n_clusters*(partition_index+1)]
TypeError: slice indices must be integers or None or have an index method

This is Spark 2.2.
Any ideas will be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions