Skip to content

More clusters than data points? #7

@cataxto

Description

@cataxto

When i run in the example, the line "model = method.fit(rdd)"
I get this error, why?
Thanks a lot

ERROR: More clusters than data points?

AssertionError Traceback (most recent call last)
in
----> 1 model = method.fit(rdd)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyspark_kmodes\pyspark_kmodes.py in fit(self, rdd)
432 # Calculate the modes locally for the set of all modes
433
--> 434 local_clusters = run_local_kmodes(clusters, self.n_clusters)
435 if self.verbosity:
436 print("Avg cost/partition:", local_clusters[1]/len(clusters))

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyspark_kmodes\pyspark_kmodes.py in run_local_kmodes(clusters, n_clusters, init, n_init, verbose)
290 km = KModes(n_clusters = n_clusters, init = init, n_init = n_init, verbose = verbose)
291 new_centroids = [cluster.centroid for cluster in clusters]
--> 292 new_modes = km.fit(new_centroids, dtype = "object")
293 return [list(new_modes.cluster_centroids_), new_modes.cost_]
294

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyspark_kmodes\Kmodes.py in fit(self, X, y, **kwargs)
305 self.cluster_centroids_, self.labels_, self.cost_, self.n_iter_ =
306 k_modes(X, self.n_clusters, self.init, self.n_init,
--> 307 self.max_iter, self.verbose)
308 return self
309

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyspark_kmodes\Kmodes.py in k_modes(X, n_clusters, init, n_init, max_iter, verbose)
164
165 npoints, nattrs = X.shape
--> 166 assert n_clusters < npoints, "More clusters than data points?"
167
168 all_centroids = []

AssertionError: More clusters than data points?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions