This project applies K-Means Clustering on real-world retail customer transaction data to segment customers based on their purchase behavior. The goal is to help businesses understand and target different customer groups more effectively.
- Technique Used: Unsupervised Machine Learning (K-Means Clustering)
- Dataset: Online Retail Dataset from Kaggle
- Approach: RFM (Recency, Frequency, Monetary) Analysis + Clustering
| Metric | Description |
|---|---|
| Recency | Days since the customer last purchased |
| Frequency | Total number of purchases made |
| Monetary | Total money spent by the customer |
-
Data Cleaning
Removed null values, duplicates, and negative quantities -
RFM Analysis
Created Recency, Frequency, and Monetary features per customer -
Data Scaling
Standardized the RFM features usingStandardScaler -
Finding Optimal Clusters
Used the Elbow Method to determine the best number of clusters (k=4) -
Clustering
Applied K-Means and assigned each customer to a segment -
Visualization
Reduced dimensions using PCA and plotted clusters in 2D
| Cluster | Summary |
|---|---|
| 0 | ⭐ Loyal Customers – frequent and active |
| 1 | |
| 2 | 👑 Super VIPs – extremely frequent and high-value |
| 3 | 💎 High-Value Actives – recent, frequent, valuable |
kmeans_customer_segmentation.ipynb– Complete notebookCustomer_Segmentation_Report.docx– Final reportCustomer_Segmentation_Presentation.pptx– Summary slides
- Python (Pandas, NumPy, Matplotlib, Seaborn)
- Scikit-learn (KMeans, StandardScaler, PCA)
- Jupyter Notebook
This project successfully demonstrates how unsupervised learning can help businesses identify key customer segments, enabling personalized marketing and improved retention strategies.
Feel free to reach out for collaboration or questions!
Intern @ Indolike
Email: your.email@example.com