Customer Segmentation via Unsupervised Learning

Executive Summary

This project delivers an end-to-end unsupervised learning pipeline designed to eliminate human cognitive bias from enterprise retail data. Utilizing transactional data from Online Retail.xlsx, the architecture processes, standardizes, compresses, and clusters behavior into mathematically isolated, strategically actionable market segments. By pairing Principal Component Analysis (PCA) with a validated K-Means algorithm, we successfully compressed over 20 dimensions of retail behavior down to 3 major customer personas.

The Pipeline Architecture (IPO Framework)

The data goes through a strict chronological pipeline following the Input-Process-Output blueprint:

SCALE (Standardization): Continuous features are mapped to a common geometric range ($z$-score scaling where $\mu = 0, \sigma = 1$) to eliminate magnitude-induced bias and ensure equal mathematical voting power.
COMPRESS (PCA): High-dimensional cloud data ($D > 20$) is transformed onto orthogonal Principal Components to dodge the curse of dimensionality while maximizing variance preservation.
CLUSTER (K-Means): Modeled to minimize the Within-Cluster Sum of Squares (WCSS), pulling customer metrics into highly cohesive clusters.
TRANSLATE (Inverse Projection): PCA-space centroid coordinates are inverse-transformed back into human-centric, real-world metrics to build marketing personas.

Technical Results & Variance Preservation

Principal Component Analysis (PCA)

The script isolated 3 orthogonal components capturing an exceptional cumulative spread of original consumer behavior:

PC1: $79.10%$ explained variance ratio
PC2: $18.33%$ explained variance ratio
PC3: $2.57%$ explained variance ratio

Total Preserved Variance: $100%$ of the core multi-dimensional feature space was successfully preserved within these 3 components, retaining perfect behavioral signals while completely shedding low-variance noise.

Mathematical Cluster Proof

The target tier framework enforced a cluster count of $k = 3$. This selection was verified via two rigorous diagnostic gatekeepers outlined in Data Science Project 3.pdf:

The Elbow Method: Identified the precise inflection point of diminishing returns for WCSS reduction.
The Silhouette Score: Confirmed strong cluster cohesion versus neighbor separation, ensuring no natural customer groups were artificially split.

Strategic Corporate Persona Matrix

By executing an inverse scaling transformation on the cluster centroids, the abstract coordinates were mapped back to their original physical medians:

Cluster ID	Assigned Corporate Persona	Median Frequency	Median Total Spend	Median Quantity	Strategic Business Action
Cluster 1	CLUSTER A: HIGH-VALUE ENGAGERS	46.0	$143,825.06	74,215.0	Deploy high-touch VIP support, exclusive perks, and experiential marketing.
Cluster 2	CLUSTER B: MID-TIER EXPLORERS	34.0	$27,498.04	15,752.0	Push targeted loyalty programs, product bundles, and flash sales.
Cluster 0	CLUSTER C: LOW-ACTIVITY CHURN RISK	2.0	$660.00	370.0	Execute drop-off re-engagement, clear value options, and win-back campaigns.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Online Retail.xlsx		Online Retail.xlsx
Project_3.py		Project_3.py
README.md		README.md
cluster_metrics.png		cluster_metrics.png
cluster_projection.png		cluster_projection.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Segmentation via Unsupervised Learning

Executive Summary

The Pipeline Architecture (IPO Framework)

Technical Results & Variance Preservation

Principal Component Analysis (PCA)

Mathematical Cluster Proof

Strategic Corporate Persona Matrix

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Customer Segmentation via Unsupervised Learning

Executive Summary

The Pipeline Architecture (IPO Framework)

Technical Results & Variance Preservation

Principal Component Analysis (PCA)

Mathematical Cluster Proof

Strategic Corporate Persona Matrix

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages