Spectral Clustering Algorithm examples

This package contains source codes of our initial implementation of several versions of spectral clustering algorithms on Apache Spark.They are:

1. Parallel Spectral Clustering based on t-nearrest neighbors(PSC)
2. Parallel Spectral Clustering based on Nystrom optimization(NYSC)
3. Parallel Spectral Clustering based on Locality Sensitive Hashing(DASC)

Run

You need to install Spark 1.3.0 or higher versions together with hadoop 1.0.4 as storage support.

After building with sbt, use spark-submit tool to submit applications to Spark cluster.

Step1: Generate data using "DataGenerator".
Step2: Run KMeansTest, SkLSHTest, SkNystromTestor SpectralKMeansTest and see their processing time and clustering accuracy(WSSE value). KMeansTest directly calls Spark MLlib's KMeans class. It is used as a reference for clustering quality measurement.  
    Step3: Change algorithm parameters to test their performance under different circumstances.

Example shell scripts are located under the root directory(run_data_generator.sh, run_KMeansTest.sh, run_SkLSH.sh, run_SpectralKMeans.sh). Please change the server address to your real condition.

For details on meanings of parameters and theoretical backgrounds of each implementation, please refer to the comments in souce codes.

Good Luck.

Correctness

Tests are passed for correctness of these implementations. However, the Nystrom optimization method regularly throw errors when data is big. PSC and LSH method works well. LSH shows the best performance among all the three implementations.

Contact

Please contact yaochunnan@gmail.com for bugs or questions.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
out/artifacts/spectralkmeans_jar		out/artifacts/spectralkmeans_jar
project/target/config-classes		project/target/config-classes
src		src
target		target
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt
run_KMeansTest.sh		run_KMeansTest.sh
run_SkLSH.sh		run_SkLSH.sh
run_SpectralKMeans.sh		run_SpectralKMeans.sh
run_data_generator.sh		run_data_generator.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spectral Clustering Algorithm examples

Run

Correctness

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spectral Clustering Algorithm examples

Run

Correctness

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages