A comprehensive data analytics project analyzing customer shopping patterns and behaviors using Python, SQL, and Power BI.
This project performs an end-to-end analysis of customer shopping behavior, exploring purchasing patterns, demographic insights, and customer preferences. The analysis includes exploratory data analysis (EDA), data cleaning, database integration with PostgreSQL, and interactive visualizations through Power BI dashboards.
The dataset contains 3,900 customer transactions with 18 features:
- Customer Demographics: Customer ID, Age, Gender, Location
- Product Information: Item Purchased, Category, Size, Color
- Purchase Details: Purchase Amount (USD), Season, Previous Purchases
- Customer Behavior: Review Rating, Subscription Status, Frequency of Purchases
- Transaction Details: Shipping Type, Discount Applied, Promo Code Used, Payment Method
- Clothing: Blouse, Sweater, Jeans, Shirt, Shorts, Dress, Pants, T-shirt, Hoodie, Skirt, Socks
- Footwear: Sandals, Sneakers, Shoes, Boots
- Outerwear: Coat, Jacket
- Accessories: Handbag, Jewelry, Scarf, Hat, Sunglasses, Belt, Backpack, Gloves
- Python 3.12: Data processing and analysis
- Libraries:
pandas- Data manipulation and analysisnumpy- Numerical computationsmatplotlib&seaborn- Data visualizationsqlalchemy- Database connectivitypsycopg2-binary- PostgreSQL adapter
- PostgreSQL: Data storage and SQL querying
- Power BI: Interactive dashboard creation
- Jupyter Notebook: Analysis environment
customer_behavior_analysis_project/
β
βββ dashboard/
β βββ customer_behavior_dashboard.pbix # Power BI dashboard file
β
βββ data/
β βββ customer_shopping_behavior.xlsx # Raw dataset (Excel format)
β βββ images/ # Dashboard preview images
β βββ dashboard_preview.png
β
βββ notebook/
β βββ Customer_behaviour_analysis.ipynb # Jupyter notebook with EDA & analysis
β
βββ sql/
β βββ customer_behavior_analysis.sql # SQL queries for database analysis
β
βββ README.md
import pandas as pd
df = pd.read_csv('customer_shopping_behavior.csv')
# Alternative: df = pd.read_excel('customer_shopping_behavior.xlsx')- Dataset shape: 3,900 rows Γ 18 columns
- Data types inspection
- Summary statistics generation
- Missing value detection (37 null values in Review Rating column)
- Column standardization: Converted column names to lowercase with underscores
- Redundancy removal: Dropped
promo_code_usedcolumn (duplicate ofdiscount_applied) - Feature engineering:
- Created
age_groupcategories - Derived
purchase_frequency_daysfrom frequency labels
- Created
- Missing value handling: Addressed null values in review ratings
Loaded cleaned data into PostgreSQL database:
from sqlalchemy import create_engine
engine = create_engine(f"postgresql+psycopg2://{username}:{password}@{host}:{port}/{database}")
df.to_sql('customer', engine, if_exists='replace', index=False)Database: customer_behavior_analysis
Table: customer
Performed various SQL queries on PostgreSQL/MySQL/SQL Server to extract insights:
- Customer segmentation analysis
- Purchase pattern identification
- Revenue analysis by category
- Geographic distribution analysis
- Subscription status impact
- Payment method preferences
Created interactive dashboards featuring:
- Sales trends over time
- Customer demographic breakdowns
- Product category performance
- Geographic sales distribution
- Customer lifetime value metrics
- Purchase frequency analysis
The Power BI dashboard provides an interactive visualization of key metrics and trends, enabling stakeholders to:
- Monitor real-time sales performance
- Identify top-performing products and categories
- Analyze customer demographics and behavior patterns
- Track subscription and payment method distributions
- Evaluate discount and promotional effectiveness
For the full interactive experience, open dashboard/customer_behavior_dashboard.pbix in Power BI Desktop.
- Gender Distribution: Analyzed purchasing behavior across genders
- Age Analysis: Customer base spans ages 18-70 with identified age group patterns
- Popular Categories: Clothing dominates purchases, followed by Footwear and Accessories
- Seasonal Trends: Purchase patterns vary by season
- Subscription Impact: High subscription rate (Yes/No analysis)
- Payment Preferences: Multiple payment methods tracked
- Shipping Choices: Express, Standard, Free Shipping, 2-Day, Next Day Air, Store Pickup
- Discount Effectiveness: All transactions included promotional discounts
pip install pandas numpy matplotlib seaborn sqlalchemy psycopg2-binary- Install PostgreSQL
- Create database:
CREATE DATABASE customer_behavior_analysis;
- Update connection credentials in the notebook
-
Clone the repository:
git clone https://github.com/yourusername/customer-behavior-analysis.git cd customer_behavior_analysis_project -
Launch Jupyter Notebook:
jupyter notebook
-
Open
notebook/Customer_behaviour_analysis.ipynband run cells sequentially -
For SQL analysis:
- Connect to PostgreSQL database
- Run queries from
sql/customer_behavior_analysis.sql
-
For Power BI dashboard:
- Open
dashboard/customer_behavior_dashboard.pbix - Refresh data connection if needed
- View dashboard preview in
data/images/dashboard_preview.png
- Open
-- Top 5 customers by purchase amount
SELECT customer_id, SUM(purchase_amount) as total_spent
FROM customer
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 5;
-- Purchase distribution by category
SELECT category, COUNT(*) as purchase_count, AVG(purchase_amount) as avg_amount
FROM customer
GROUP BY category;
-- Monthly revenue trend
SELECT EXTRACT(MONTH FROM purchase_date) as month, SUM(purchase_amount) as revenue
FROM customer
GROUP BY month
ORDER BY month;- Customer Segmentation: Identify high-value customer groups
- Inventory Management: Optimize stock based on popular items
- Marketing Strategy: Target campaigns based on demographics
- Pricing Optimization: Analyze discount effectiveness
- Seasonal Planning: Prepare for peak shopping seasons
- Implement machine learning models for customer churn prediction
- Add RFM (Recency, Frequency, Monetary) analysis
- Develop recommendation system based on purchase history
- Time series forecasting for sales prediction
- Customer lifetime value (CLV) modeling
This project is licensed under the MIT License - see the LICENSE file for details.
Note: Replace placeholder database credentials before running.
