Skip to content

codezelaca/da-python_simple_dashboard

Repository files navigation

Mall Customer Analytics — Strategy Dashboard

A beginner-to-intermediate data analysis project built with Python in a Jupyter Notebook. It loads mall customer data, cleans it, and produces a single multi-panel matplotlib/seaborn dashboard saved as final_mall_dashboard.png.


Table of Contents

  1. Dataset Overview
  2. Project Structure
  3. What the Notebook Does — Step by Step
  4. Dashboard Panels Explained
  5. Key Insights from the Current Analysis
  6. What Interns Can Improve
  7. New Things to Practice for a Better Dashboard
  8. How to Run

1. Dataset Overview

File: da_mall_customer.csv
Rows: 200 customers
Columns: 7

Column Type Description
CustomerID Integer Unique identifier for each customer
Gender Categorical (M / F) Customer gender
Age Integer Customer age (range ~18–70)
Education Categorical Highest education level — High School, Graduate, College, Post-Graduate, Doctorate, Uneducated, Unknown
Marital Status Categorical Married, Single, Divorced, Unknown
Annual Income (k$) Integer Annual income in thousands of USD
Spending Score (1-100) Integer Mall-assigned score based on customer spending behavior (1 = lowest, 100 = highest)

Notable data quality issues (handled in the notebook):

  • The Education column had a trailing whitespace in its name ("Education ") — stripped during cleaning.
  • Both Education and Marital Status contain "Unknown" entries that are relabeled to "Not Specified" for cleaner chart labels.

2. Project Structure

da_dashbrd/
├── anailyze.ipynb            # Main analysis notebook
├── da_mall_customer.csv      # Raw dataset (200 mall customers)
├── final_mall_dashboard.png  # Output dashboard image (auto-generated)
├── requirements.txt          # Python dependencies
└── README.md                 # This file

3. What the Notebook Does — Step by Step

Cell 1 — Import core libraries

import pandas as pd
import numpy as np

Loads pandas for data manipulation and numpy for numerical operations.

Cell 2 — Load and preview data

df = pd.read_csv('da_mall_customer.csv')
df.head()

Reads the CSV into a DataFrame and displays the first 5 rows to verify it loaded correctly.

Cell 3 — Import visualization libraries

import matplotlib.pyplot as plt
import seaborn as sns

Loads matplotlib (base charts) and seaborn (statistical visualizations on top of matplotlib).

Cell 4 — Clean column names

df.columns = df.columns.str.strip()

Removes any leading/trailing whitespace from all column headers. Without this, "Education " (with a trailing space) would cause KeyError failures later.

Cell 5 — Standardize unknown values

df['Education'] = df['Education'].replace('Unknown', 'Not Specified')
df['Marital Status'] = df['Marital Status'].replace('Unknown', 'Not Specified')

Replaces the literal string "Unknown" with "Not Specified" in both categorical columns for cleaner pie chart labels and better readability.

Cell 6 — Build and save the full dashboard

Creates an 18 × 12 inch figure with a 3 × 3 GridSpec layout containing 6 panels. The dashboard is saved as final_mall_dashboard.png and rendered inline.


4. Dashboard Panels Explained

Panel 1 — Business KPIs (Top Left)

A text-only panel displaying three headline numbers:

  • Total Shoppers: 200
  • Avg. Income: mean of Annual Income (k$)
  • Avg. Spend Score: mean of Spending Score (1-100)

This gives an at-a-glance executive summary without any chart overhead.

Panel 2 — Customer Base by Gender (Top Center)

A seaborn.countplot showing how many Male vs Female customers are in the dataset. Quick check on demographic composition.

Panel 3 — Education Distribution (Top Right)

A matplotlib pie chart showing the percentage split across education levels (Graduate, High School, College, Post-Graduate, Doctorate, Uneducated, Not Specified).

Panel 4 — Age vs. Spending Score Trend (Middle Left + Center, spanning 2 columns)

A seaborn.regplot (scatter + regression line) plotting Age on the X-axis against Spending Score on the Y-axis. The red regression line shows the overall trend — generally, spending score decreases slightly as age increases.

Panel 5 — Income Distribution (Middle Right)

A seaborn.histplot with KDE (Kernel Density Estimate) overlay for Annual Income (k$). Reveals whether income is normally distributed, skewed, or bimodal.

Panel 6 — Market Segmentation: Income vs. Spending Score (Bottom Row, full width)

The most strategic panel. A seaborn.scatterplot mapping:

  • X-axis: Annual Income (k$)
  • Y-axis: Spending Score (1-100)
  • Color (hue): Gender
  • Point size: Age

Two dashed grey lines divide the space into quadrants:

  • Vertical line at income = $60k
  • Horizontal line at spending score = 50

Three quadrant labels are annotated:

Label Quadrant Meaning
IMPULSIVE Low Income, High Spending Spends a lot despite earning little
TARGET GROUP High Income, High Spending Best customers — high value, high engagement
HIGH POTENTIAL High Income, Low Spending Earns a lot but doesn't spend — opportunity for upselling

5. Key Insights from the Current Analysis

  • The dataset is relatively balanced but slightly female-dominant.
  • Graduate-level customers form the largest education segment.
  • There is a mild negative correlation between age and spending score — younger shoppers tend to spend more.
  • Income follows a roughly normal distribution with most customers earning $40k–$80k.
  • The segmentation scatter reveals a clear cluster in the Target Group (high income + high spending) that warrants focused marketing.
  • The High Potential segment (high income, low spending) is the biggest untapped opportunity.

6. What Can Improve

These are concrete improvements to the existing analysis:

Data Quality

  • Handle the "Uneducated" and "Not Specified" labels more carefully — consider grouping them or excluding them from percentage calculations.
  • Check for and handle duplicate CustomerIDs.
  • Validate the Spending Score column for out-of-range values (should be 1–100).
  • Add df.info() and df.describe() cells to document data types and basic statistics.

Chart Improvements

  • Add value labels on top of the gender countplot bars (e.g., show exact counts).
  • Add a fourth quadrant label ("SAVERS" — Low Income, Low Spending) to complete the segmentation story.
  • Use consistent color palettes across all panels for a more professional look.
  • Add axis labels and units to every chart (e.g., "Annual Income (USD thousands)").
  • Increase font sizes for tick labels — currently hard to read at smaller display sizes.

Analysis Depth

  • Add a Marital Status panel — it's in the dataset but not visualized at all.
  • Break down Spending Score by Education using a boxplot to see which education group spends the most.
  • Add a correlation heatmap (sns.heatmap) for numeric columns: Age, Income, Spending Score.
  • Add a Gender × Income grouped bar chart to compare income levels between male and female customers.

7. New Things to Practice for a Better Dashboard

These are skills and tools interns can learn to significantly level up the dashboard:

Intermediate Python / Data Analysis

Skill What to Practice
groupby + agg Calculate average spending score per education level, per gender, per age group
pd.cut / pd.qcut Create age buckets (18–25, 26–35, 36–50, 50+) and analyze each group separately
value_counts(normalize=True) Convert counts to percentages for cleaner reporting
Boolean filtering Isolate the "Target Group" customers and profile them separately

Machine Learning (next step after EDA)

Technique Purpose
K-Means Clustering (sklearn.cluster.KMeans) Automatically find customer segments from Income + Spending Score instead of using manual quadrant lines
Elbow Method Determine the optimal number of clusters k
PCA (Principal Component Analysis) Reduce dimensions for visualizing clusters when more features are added

Interactive Dashboards (replace static matplotlib)

Tool Use Case
Plotly Express (plotly.express) Drop-in replacement for seaborn — produces interactive hover charts
Dash by Plotly Build a full web dashboard app in Python with dropdown filters and sliders
Streamlit Fastest way to turn a notebook into a shareable web app — add a gender/income filter sidebar
Power BI / Tableau Industry-standard BI tools — practice connecting CSV data and building the same dashboard visually

Code Quality & Reproducibility

Practice Why It Matters
Write reusable functions for each chart panel Keeps code DRY and easier to maintain
Add Markdown cells between code cells Documents your reasoning — essential for professional notebooks
Pin library versions in requirements.txt (e.g., pandas==2.2.0) Ensures the notebook runs identically on any machine
Use pathlib.Path instead of raw strings for file paths Cross-platform compatibility (Windows vs Mac/Linux)

Storytelling & Presentation

  • Structure the notebook as a story: Business Question → Data → Finding → Recommendation
  • Add a final Markdown cell with a written summary of the 3 most important findings and their business implications
  • Export the notebook as a PDF or HTML report (jupyter nbconvert) to share without requiring Python

8. How to Run

Prerequisites: Python 3.8+

# 1. Install dependencies
pip install -r requirements.txt

# 2. Open the notebook
jupyter notebook anailyze.ipynb
# or in VS Code: open anailyze.ipynb directly

# 3. Run all cells (Kernel > Restart & Run All)
# Output: final_mall_dashboard.png will be created in the same folder

Dataset: Mall Customer Segmentation — commonly used for beginner clustering and EDA practice.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors