Learn clustering with k-means
- Brush up on Unsupervised learning
- A good intro to kmeans
- A Friendly Introduction to K-Means clustering algorithm
- Visualizing kmeans
- visualizing Kmeans
- Kmeans intro video - a nice video (17 mins) by Luis Serrano
- Section 10.3 "Clustering", pp 385 in Introduction to Statistical Learning
- What are the strengths and weaknesses of KMeans?
- Can KMeans predict optimal value for K?
- How will we find optimal K value?
- How will outliers impact Kmeans
★☆☆ - Easy
★★☆ - Medium
★★★ - Challenging
★★★★ - Bonus
Use Scikit's make_blobs to generate some data
Cluster it using Kmeans
Start with this notebook: kmeans-1-intro
We are going cluster cars dataset.
Here is the cars data set
Data looks likes this:
model mpg cyl disp hp drat wt qsec vs am gear
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4
Only use mpg and cyl columns and cluster the cars.
You can start with this notebook: kmeans-2-mtcars
This is a fun lab. We will cluster Uber pick up locations and figure out where the demand hot-spot is.
Here is uber dataset
You can start with this notebook: kmeans-3-uber-pickups