Skip to content

gupta-aanshi/customer_behavior_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›οΈCustomer Shopping Behavior Analysis

A comprehensive data analytics project analyzing customer shopping patterns and behaviors using Python, SQL, and Power BI.

πŸ“Š Project Overview

This project performs an end-to-end analysis of customer shopping behavior, exploring purchasing patterns, demographic insights, and customer preferences. The analysis includes exploratory data analysis (EDA), data cleaning, database integration with PostgreSQL, and interactive visualizations through Power BI dashboards.

πŸ“ Dataset

The dataset contains 3,900 customer transactions with 18 features:

  • Customer Demographics: Customer ID, Age, Gender, Location
  • Product Information: Item Purchased, Category, Size, Color
  • Purchase Details: Purchase Amount (USD), Season, Previous Purchases
  • Customer Behavior: Review Rating, Subscription Status, Frequency of Purchases
  • Transaction Details: Shipping Type, Discount Applied, Promo Code Used, Payment Method

Data Categories

  • Clothing: Blouse, Sweater, Jeans, Shirt, Shorts, Dress, Pants, T-shirt, Hoodie, Skirt, Socks
  • Footwear: Sandals, Sneakers, Shoes, Boots
  • Outerwear: Coat, Jacket
  • Accessories: Handbag, Jewelry, Scarf, Hat, Sunglasses, Belt, Backpack, Gloves

πŸ› οΈ Tools & Technologies

  • Python 3.12: Data processing and analysis
  • Libraries:
    • pandas - Data manipulation and analysis
    • numpy - Numerical computations
    • matplotlib & seaborn - Data visualization
    • sqlalchemy - Database connectivity
    • psycopg2-binary - PostgreSQL adapter
  • PostgreSQL: Data storage and SQL querying
  • Power BI: Interactive dashboard creation
  • Jupyter Notebook: Analysis environment

πŸ“‚ Project Structure

customer_behavior_analysis_project/
β”‚
β”œβ”€β”€ dashboard/
β”‚   └── customer_behavior_dashboard.pbix      # Power BI dashboard file
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ customer_shopping_behavior.xlsx       # Raw dataset (Excel format)
β”‚   └── images/                               # Dashboard preview images
β”‚       └── dashboard_preview.png
β”‚
β”œβ”€β”€ notebook/
β”‚   └── Customer_behaviour_analysis.ipynb     # Jupyter notebook with EDA & analysis
β”‚
β”œβ”€β”€ sql/
β”‚   └── customer_behavior_analysis.sql        # SQL queries for database analysis
β”‚
└── README.md

πŸ” Analysis Steps

1. Data Loading

import pandas as pd
df = pd.read_csv('customer_shopping_behavior.csv')
# Alternative: df = pd.read_excel('customer_shopping_behavior.xlsx')

2. Exploratory Data Analysis (EDA)

  • Dataset shape: 3,900 rows Γ— 18 columns
  • Data types inspection
  • Summary statistics generation
  • Missing value detection (37 null values in Review Rating column)

3. Data Cleaning

  • Column standardization: Converted column names to lowercase with underscores
  • Redundancy removal: Dropped promo_code_used column (duplicate of discount_applied)
  • Feature engineering:
    • Created age_group categories
    • Derived purchase_frequency_days from frequency labels
  • Missing value handling: Addressed null values in review ratings

4. Database Integration

Loaded cleaned data into PostgreSQL database:

from sqlalchemy import create_engine

engine = create_engine(f"postgresql+psycopg2://{username}:{password}@{host}:{port}/{database}")
df.to_sql('customer', engine, if_exists='replace', index=False)

Database: customer_behavior_analysis
Table: customer

5. SQL Querying

Performed various SQL queries on PostgreSQL/MySQL/SQL Server to extract insights:

  • Customer segmentation analysis
  • Purchase pattern identification
  • Revenue analysis by category
  • Geographic distribution analysis
  • Subscription status impact
  • Payment method preferences

6. Power BI Dashboard

Created interactive dashboards featuring:

  • Sales trends over time
  • Customer demographic breakdowns
  • Product category performance
  • Geographic sales distribution
  • Customer lifetime value metrics
  • Purchase frequency analysis

πŸ“Š Dashboard Preview

Customer Behavior Dashboard

The Power BI dashboard provides an interactive visualization of key metrics and trends, enabling stakeholders to:

  • Monitor real-time sales performance
  • Identify top-performing products and categories
  • Analyze customer demographics and behavior patterns
  • Track subscription and payment method distributions
  • Evaluate discount and promotional effectiveness

For the full interactive experience, open dashboard/customer_behavior_dashboard.pbix in Power BI Desktop.

πŸ“ˆ Key Insights

  • Gender Distribution: Analyzed purchasing behavior across genders
  • Age Analysis: Customer base spans ages 18-70 with identified age group patterns
  • Popular Categories: Clothing dominates purchases, followed by Footwear and Accessories
  • Seasonal Trends: Purchase patterns vary by season
  • Subscription Impact: High subscription rate (Yes/No analysis)
  • Payment Preferences: Multiple payment methods tracked
  • Shipping Choices: Express, Standard, Free Shipping, 2-Day, Next Day Air, Store Pickup
  • Discount Effectiveness: All transactions included promotional discounts

πŸš€ How to Run

Prerequisites

pip install pandas numpy matplotlib seaborn sqlalchemy psycopg2-binary

Database Setup

  1. Install PostgreSQL
  2. Create database:
    CREATE DATABASE customer_behavior_analysis;
  3. Update connection credentials in the notebook

Running the Analysis

  1. Clone the repository:

    git clone https://github.com/yourusername/customer-behavior-analysis.git
    cd customer_behavior_analysis_project
  2. Launch Jupyter Notebook:

    jupyter notebook
  3. Open notebook/Customer_behaviour_analysis.ipynb and run cells sequentially

  4. For SQL analysis:

    • Connect to PostgreSQL database
    • Run queries from sql/customer_behavior_analysis.sql
  5. For Power BI dashboard:

    • Open dashboard/customer_behavior_dashboard.pbix
    • Refresh data connection if needed
    • View dashboard preview in data/images/dashboard_preview.png

πŸ“Š Sample SQL Queries

-- Top 5 customers by purchase amount
SELECT customer_id, SUM(purchase_amount) as total_spent
FROM customer
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 5;

-- Purchase distribution by category
SELECT category, COUNT(*) as purchase_count, AVG(purchase_amount) as avg_amount
FROM customer
GROUP BY category;

-- Monthly revenue trend
SELECT EXTRACT(MONTH FROM purchase_date) as month, SUM(purchase_amount) as revenue
FROM customer
GROUP BY month
ORDER BY month;

🎯 Business Applications

  • Customer Segmentation: Identify high-value customer groups
  • Inventory Management: Optimize stock based on popular items
  • Marketing Strategy: Target campaigns based on demographics
  • Pricing Optimization: Analyze discount effectiveness
  • Seasonal Planning: Prepare for peak shopping seasons

πŸ“Œ Future Enhancements

  • Implement machine learning models for customer churn prediction
  • Add RFM (Recency, Frequency, Monetary) analysis
  • Develop recommendation system based on purchase history
  • Time series forecasting for sales prediction
  • Customer lifetime value (CLV) modeling

πŸ‘€ Author

Aanshi Gupta
LinkedIn | Email

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


Note: Replace placeholder database credentials before running.

About

End-to-end customer shopping behavior analysis using Python, SQL, and Power BI. Includes EDA, data cleaning, PostgreSQL integration, and interactive dashboards.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors