Mall Customer Analytics — Strategy Dashboard

A beginner-to-intermediate data analysis project built with Python in a Jupyter Notebook. It loads mall customer data, cleans it, and produces a single multi-panel matplotlib/seaborn dashboard saved as final_mall_dashboard.png.

1. Dataset Overview

File: da_mall_customer.csv
Rows: 200 customers
Columns: 7

Column	Type	Description
`CustomerID`	Integer	Unique identifier for each customer
`Gender`	Categorical (M / F)	Customer gender
`Age`	Integer	Customer age (range ~18–70)
`Education`	Categorical	Highest education level — High School, Graduate, College, Post-Graduate, Doctorate, Uneducated, Unknown
`Marital Status`	Categorical	Married, Single, Divorced, Unknown
`Annual Income (k$)`	Integer	Annual income in thousands of USD
`Spending Score (1-100)`	Integer	Mall-assigned score based on customer spending behavior (1 = lowest, 100 = highest)

Notable data quality issues (handled in the notebook):

The Education column had a trailing whitespace in its name ("Education ") — stripped during cleaning.
Both Education and Marital Status contain "Unknown" entries that are relabeled to "Not Specified" for cleaner chart labels.

2. Project Structure

da_dashbrd/
├── anailyze.ipynb            # Main analysis notebook
├── da_mall_customer.csv      # Raw dataset (200 mall customers)
├── final_mall_dashboard.png  # Output dashboard image (auto-generated)
├── requirements.txt          # Python dependencies
└── README.md                 # This file

3. What the Notebook Does — Step by Step

Cell 1 — Import core libraries

import pandas as pd
import numpy as np

Loads pandas for data manipulation and numpy for numerical operations.

Cell 2 — Load and preview data

df = pd.read_csv('da_mall_customer.csv')
df.head()

Reads the CSV into a DataFrame and displays the first 5 rows to verify it loaded correctly.

Cell 3 — Import visualization libraries

import matplotlib.pyplot as plt
import seaborn as sns

Loads matplotlib (base charts) and seaborn (statistical visualizations on top of matplotlib).

Cell 4 — Clean column names

df.columns = df.columns.str.strip()

Removes any leading/trailing whitespace from all column headers. Without this, "Education " (with a trailing space) would cause KeyError failures later.

Cell 5 — Standardize unknown values

df['Education'] = df['Education'].replace('Unknown', 'Not Specified')
df['Marital Status'] = df['Marital Status'].replace('Unknown', 'Not Specified')

Replaces the literal string "Unknown" with "Not Specified" in both categorical columns for cleaner pie chart labels and better readability.

Cell 6 — Build and save the full dashboard

Creates an 18 × 12 inch figure with a 3 × 3 GridSpec layout containing 6 panels. The dashboard is saved as final_mall_dashboard.png and rendered inline.

4. Dashboard Panels Explained

Panel 1 — Business KPIs (Top Left)

A text-only panel displaying three headline numbers:

Total Shoppers: 200
Avg. Income: mean of Annual Income (k$)
Avg. Spend Score: mean of Spending Score (1-100)

This gives an at-a-glance executive summary without any chart overhead.

Panel 2 — Customer Base by Gender (Top Center)

A seaborn.countplot showing how many Male vs Female customers are in the dataset. Quick check on demographic composition.

Panel 3 — Education Distribution (Top Right)

A matplotlib pie chart showing the percentage split across education levels (Graduate, High School, College, Post-Graduate, Doctorate, Uneducated, Not Specified).

Panel 4 — Age vs. Spending Score Trend (Middle Left + Center, spanning 2 columns)

A seaborn.regplot (scatter + regression line) plotting Age on the X-axis against Spending Score on the Y-axis. The red regression line shows the overall trend — generally, spending score decreases slightly as age increases.

Panel 5 — Income Distribution (Middle Right)

A seaborn.histplot with KDE (Kernel Density Estimate) overlay for Annual Income (k$). Reveals whether income is normally distributed, skewed, or bimodal.

Panel 6 — Market Segmentation: Income vs. Spending Score (Bottom Row, full width)

The most strategic panel. A seaborn.scatterplot mapping:

X-axis: Annual Income (k$)
Y-axis: Spending Score (1-100)
Color (hue): Gender
Point size: Age

Two dashed grey lines divide the space into quadrants:

Vertical line at income = $60k
Horizontal line at spending score = 50

Three quadrant labels are annotated:

Label	Quadrant	Meaning
IMPULSIVE	Low Income, High Spending	Spends a lot despite earning little
TARGET GROUP	High Income, High Spending	Best customers — high value, high engagement
HIGH POTENTIAL	High Income, Low Spending	Earns a lot but doesn't spend — opportunity for upselling

5. Key Insights from the Current Analysis

The dataset is relatively balanced but slightly female-dominant.
Graduate-level customers form the largest education segment.
There is a mild negative correlation between age and spending score — younger shoppers tend to spend more.
Income follows a roughly normal distribution with most customers earning $40k–$80k.
The segmentation scatter reveals a clear cluster in the Target Group (high income + high spending) that warrants focused marketing.
The High Potential segment (high income, low spending) is the biggest untapped opportunity.

6. What Can Improve

These are concrete improvements to the existing analysis:

Data Quality

Handle the "Uneducated" and "Not Specified" labels more carefully — consider grouping them or excluding them from percentage calculations.
Check for and handle duplicate CustomerIDs.
Validate the Spending Score column for out-of-range values (should be 1–100).
Add df.info() and df.describe() cells to document data types and basic statistics.

Chart Improvements

Add value labels on top of the gender countplot bars (e.g., show exact counts).
Add a fourth quadrant label ("SAVERS" — Low Income, Low Spending) to complete the segmentation story.
Use consistent color palettes across all panels for a more professional look.
Add axis labels and units to every chart (e.g., "Annual Income (USD thousands)").
Increase font sizes for tick labels — currently hard to read at smaller display sizes.

Analysis Depth

Add a Marital Status panel — it's in the dataset but not visualized at all.
Break down Spending Score by Education using a boxplot to see which education group spends the most.
Add a correlation heatmap (sns.heatmap) for numeric columns: Age, Income, Spending Score.
Add a Gender × Income grouped bar chart to compare income levels between male and female customers.

7. New Things to Practice for a Better Dashboard

These are skills and tools interns can learn to significantly level up the dashboard:

Intermediate Python / Data Analysis

Skill	What to Practice
`groupby` + `agg`	Calculate average spending score per education level, per gender, per age group
`pd.cut` / `pd.qcut`	Create age buckets (18–25, 26–35, 36–50, 50+) and analyze each group separately
`value_counts(normalize=True)`	Convert counts to percentages for cleaner reporting
Boolean filtering	Isolate the "Target Group" customers and profile them separately

Machine Learning (next step after EDA)

Technique	Purpose
K-Means Clustering (`sklearn.cluster.KMeans`)	Automatically find customer segments from Income + Spending Score instead of using manual quadrant lines
Elbow Method	Determine the optimal number of clusters `k`
PCA (Principal Component Analysis)	Reduce dimensions for visualizing clusters when more features are added

Interactive Dashboards (replace static matplotlib)

Tool	Use Case
Plotly Express (`plotly.express`)	Drop-in replacement for seaborn — produces interactive hover charts
Dash by Plotly	Build a full web dashboard app in Python with dropdown filters and sliders
Streamlit	Fastest way to turn a notebook into a shareable web app — add a gender/income filter sidebar
Power BI / Tableau	Industry-standard BI tools — practice connecting CSV data and building the same dashboard visually

Code Quality & Reproducibility

Practice	Why It Matters
Write reusable functions for each chart panel	Keeps code DRY and easier to maintain
Add Markdown cells between code cells	Documents your reasoning — essential for professional notebooks
Pin library versions in `requirements.txt` (e.g., `pandas==2.2.0`)	Ensures the notebook runs identically on any machine
Use `pathlib.Path` instead of raw strings for file paths	Cross-platform compatibility (Windows vs Mac/Linux)

Storytelling & Presentation

Structure the notebook as a story: Business Question → Data → Finding → Recommendation
Add a final Markdown cell with a written summary of the 3 most important findings and their business implications
Export the notebook as a PDF or HTML report (jupyter nbconvert) to share without requiring Python

8. How to Run

Prerequisites: Python 3.8+

# 1. Install dependencies
pip install -r requirements.txt

# 2. Open the notebook
jupyter notebook anailyze.ipynb
# or in VS Code: open anailyze.ipynb directly

# 3. Run all cells (Kernel > Restart & Run All)
# Output: final_mall_dashboard.png will be created in the same folder

Dataset: Mall Customer Segmentation — commonly used for beginner clustering and EDA practice.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
anailyze.ipynb		anailyze.ipynb
da_mall_customer.csv		da_mall_customer.csv
final_mall_dashboard.png		final_mall_dashboard.png
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Mall Customer Analytics — Strategy Dashboard

Table of Contents

1. Dataset Overview

2. Project Structure

3. What the Notebook Does — Step by Step

Cell 1 — Import core libraries

Cell 2 — Load and preview data

Cell 3 — Import visualization libraries

Cell 4 — Clean column names

Cell 5 — Standardize unknown values

Cell 6 — Build and save the full dashboard

4. Dashboard Panels Explained

Panel 1 — Business KPIs (Top Left)

Panel 2 — Customer Base by Gender (Top Center)

Panel 3 — Education Distribution (Top Right)

Panel 4 — Age vs. Spending Score Trend (Middle Left + Center, spanning 2 columns)

Panel 5 — Income Distribution (Middle Right)

Panel 6 — Market Segmentation: Income vs. Spending Score (Bottom Row, full width)

5. Key Insights from the Current Analysis

6. What Can Improve

Data Quality

Chart Improvements

Analysis Depth

7. New Things to Practice for a Better Dashboard

Intermediate Python / Data Analysis

Machine Learning (next step after EDA)

Interactive Dashboards (replace static matplotlib)

Code Quality & Reproducibility

Storytelling & Presentation

8. How to Run

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages