Skip to content

debarghamitraroy/Mall-Customer-Segmentation

Repository files navigation

Mall Customer Segmentation

This project applies machine learning clustering techniques to segment mall customers based on their age, annual income, and spending behavior. The goal is to identify meaningful customer groups that can help businesses design targeted marketing strategies, improve customer retention, and maximize revenue.

The project implements and compares K-Means and DBSCAN clustering algorithms, visualizes clusters in 2D and 3D, and evaluates model quality using Silhouette Score and the Elbow Method.

Dataset

We are using the Mall Customers Segmentation Dataset from Kaggle. The dataset contains customer information with the following key features:

  • Age
  • Annual Income (k$)
  • Spending Score (1–100)

These attributes are commonly used in customer behavior analysis and market segmentation.

Machine Learning Techniques Used

1. K-Means Clustering

K-Means is used to group customers into a fixed number of clusters based on similarity.

  • The Elbow Method is applied to determine the optimal number of clusters.
  • The dataset is standardized before clustering to ensure fair distance calculation.
  • Final clusters are visualized in 3D.

2. DBSCAN (Density-Based Spatial Clustering)

DBSCAN is used to detect clusters of arbitrary shape and identify outliers.

  • The k-distance graph is used to estimate the optimal eps.
  • Silhouette Score is used to find the best min_samples.
  • Noise points are identified and handled separately.

Data Preprocessing

  • Selected features: Age, Annual Income (k$), Spending Score (1–100)
  • Standardization is applied using StandardScaler to normalize all features.

Model Evaluation

Method Purpose
Elbow Method To find optimal number of clusters for K-Means
Silhouette Score To evaluate clustering quality
k-Distance Graph To determine optimal eps for DBSCAN

Visualizations

The project includes:

  • Histograms with KDE plots
  • Correlation heatmaps
  • K-Means elbow curve
  • 2D cluster plots
  • Interactive 3D cluster plots (Plotly)
  • DBSCAN noise and cluster visualization

Model Saving

Trained models are saved using pickle.

model/
├── kmeans.pkl
└── dbscan.pkl

These can be reloaded later for predictions or deployment.

Technologies Used

Python UV NumPy Pandas Matplotlib Seaborn Plotly Scikit-Learn Streamlit Docker Render

How to Run the Project

1. Clone the repository

git clone https://github.com/your-username/mall-customer-segmentation.git
cd mall-customer-segmentation

2. Create environment and install dependencies

uv venv
uv sync

3. Run the notebook or scripts

Open the Jupyter notebook or run the Python scripts to explore clustering and visualizations.

Business Value

This project helps businesses:

  • Identify high-value customers
  • Detect low-spending or inactive customers
  • Design personalized marketing campaigns
  • Improve customer retention strategies

About

Customer segmentation analysis on mall customer data using exploratory data analysis and clustering techniques.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages