Clustering_IntroToDataMining

This goal of this project was to increasing familiarity with data clustering packages available in R.

It uses the following clustering techniques:

Instructions to run file:

Set working directory to Raheen_Mazgaonkar_47144316 (This is required as the script references to the excel sheet present in this folder. If different excel has to be loaded please change path in readData())
Install the following packages (using install.packages(), code for this is present but commented in script to avoid reinstallation) i) stats (for k-means and hclust) ii) fpc (for dbscan) iii) dbscan (for sNNclust) iv) ClusterR (for accuracy measure) v) mclust (for adjusted rand index) vi) rgl (for scatterplot) vii) car (for scatterplot) viii) factoextra ( for dendogram and scatterplot) ix) zeallot (for getting multiple output from function)
For dataset1, run proj2p1_final.R from source. Note: i) Each plot gets over-written on the previous one. So in case it is required to view dendogram or kNNdist plots, run each clustering separately. ii) RGL is used for scatterplots, it doesn't display title but displays a number indicating order of display. Order in which scatterplots are displayed is Original labels, K-means, Hierachical, Density-based and Graph-based. iii) Plotting dendogram will take considerable time.
For dataset2, run proj2p2_final.R from source. Note: i) Plotting final clusters will take considerable amount of time even after processing has stopped.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Project2_Report.pdf		Project2_Report.pdf
README.md		README.md
dataset1.csv		dataset1.csv
proj2p1_final.R		proj2p1_final.R
proj2p2_final.R		proj2p2_final.R

Provide feedback