This project applies machine learning clustering techniques to segment mall customers based on their age, annual income, and spending behavior. The goal is to identify meaningful customer groups that can help businesses design targeted marketing strategies, improve customer retention, and maximize revenue.
The project implements and compares K-Means and DBSCAN clustering algorithms, visualizes clusters in 2D and 3D, and evaluates model quality using Silhouette Score and the Elbow Method.
We are using the Mall Customers Segmentation Dataset from Kaggle. The dataset contains customer information with the following key features:
- Age
- Annual Income (k$)
- Spending Score (1–100)
These attributes are commonly used in customer behavior analysis and market segmentation.
K-Means is used to group customers into a fixed number of clusters based on similarity.
- The Elbow Method is applied to determine the optimal number of clusters.
- The dataset is standardized before clustering to ensure fair distance calculation.
- Final clusters are visualized in 3D.
DBSCAN is used to detect clusters of arbitrary shape and identify outliers.
- The k-distance graph is used to estimate the optimal
eps. - Silhouette Score is used to find the best
min_samples. - Noise points are identified and handled separately.
- Selected features:
Age,Annual Income (k$),Spending Score (1–100) - Standardization is applied using StandardScaler to normalize all features.
| Method | Purpose |
|---|---|
| Elbow Method | To find optimal number of clusters for K-Means |
| Silhouette Score | To evaluate clustering quality |
| k-Distance Graph | To determine optimal eps for DBSCAN |
The project includes:
- Histograms with KDE plots
- Correlation heatmaps
- K-Means elbow curve
- 2D cluster plots
- Interactive 3D cluster plots (Plotly)
- DBSCAN noise and cluster visualization
Trained models are saved using pickle.
model/
├── kmeans.pkl
└── dbscan.pkl
These can be reloaded later for predictions or deployment.
git clone https://github.com/your-username/mall-customer-segmentation.git
cd mall-customer-segmentationuv venv
uv syncOpen the Jupyter notebook or run the Python scripts to explore clustering and visualizations.
This project helps businesses:
- Identify high-value customers
- Detect low-spending or inactive customers
- Design personalized marketing campaigns
- Improve customer retention strategies