This project performs an end to end exploratory and analytical study on the US Superstore dataset using Python.
The goal is to extract actionable business insights related to sales performance, profitability, customer behavior, product performance, and operational efficiency.
This project is designed as a portfolio-level data analytics project demonstrating Python, Pandas, and data visualization skills.
- Source: US Superstore Sales Dataset
- Format: CSV
- Key Features:
- Order and shipping dates
- Sales and profit and discount
- Customer and product and regional data
The raw dataset is stored in the "data/" folder and remains unchanged for transparency.
- Python
- Pandas
- Matplotlib
- Seaborn
- Jupyter Notebook
- Loaded raw CSV data using Pandas
- Cleaned column names for consistency
- Converted date columns to datetime format
- Created additional time-based columns:
- Order Year
- Order Month
- Order Year-Month
- Calculated shipping duration in days
- Checked for missing values and duplicates
- Time based features for trend analysis
- Shipping duration for logistics insights
- Aggregated metrics for customer, product, and regional analysis
The analysis focuses on multiple business dimensions:
- Time based performance
- Customer value
- Product profitability
- Regional and operational efficiency
- Discount and profitability relationships
-
Time-Based Analysis
- Monthly revenue and profit trends
- Sales and profit growth patterns over time
-
Customer Analysis
- Top 10 customers by revenue
- Top 10 customers by profit
-
Product Analysis
- Revenue and profit by category and sub category
- Top 10 products by revenue and profit
- Identification of low performing products
-
Geographical Analysis
- Revenue and profit by region
- Identification of loss making regions
Operational & Risk Analysis
- Shipping duration vs profit by region
- Discount vs profit relationship
- Impact of high discounts on profitability
Programming & Libraries
-Python
-Pandas – data manipulation, aggregation, feature engineering
-NumPy – numerical operations
-Matplotlib – data visualization
-Seaborn – advanced statistical visualizations
Data Cleaning & Preparation
-Handling missing values
-Data type conversions
-Removing inconsistencies
-Creating derived columns (e.g., order year, order month, shipping duration)
-Date-time processing
Feature Engineering
-Extracting year and month from order dates
-Creating time-based features for trend analysis
-Calculating shipping duration
-Preparing categorical features for grouping and aggregation
Exploratory Data Analysis (EDA)
-Sales and profit trend analysis
-Category and sub-category performance analysis
-Customer segmentation (top customers by revenue & profit)
-Product-level performance analysis
-Discount impact analysis
-Regional performance comparison
Data Aggregation & Analysis Techniques
-GroupBy operations
-Sorting and ranking
-Time-series aggregation
-Comparative analysis across categories and regions
Data Visualization
-Line charts (monthly revenue & profit trends)
-Bar charts (top customers and products)
-Heatmaps (category and sub-category profitability)
-Scatter plots (discount vs profit, shipping duration vs profit)
-Proper labeling, legends, and figure sizing
Business & Analytical Skills
-Translating data into actionable business insights
-Identifying profitability drivers and risks
-Detecting operational inefficiencies
-Insight summarization and storytelling
-Decision-oriented analysis
Project & Workflow Skills
-Structured notebook design (layer-based analysis)
-Reproducible analysis workflow
-Clear documentation using Markdown
-GitHub project organization
-Version control fundamentals