This project provides a comprehensive data analytics and machine learning solution to address the common paradox of "high sales, low profit" in retail businesses. By analyzing Superstore sales data, it identifies hidden drivers of unprofitability and offers actionable strategies to maximize profit.
The core objective is to move beyond simple reporting and leverage advanced analytics and machine learning to answer strategic business questions:
- Why is the profit margin decreasing despite increasing sales?
- Which customers are truly profitable, and which are primarily driven by discounts?
- How should the product portfolio be optimized for better profitability?
This project is implemented across several key modules, each addressing a specific aspect of profit optimization:
Problem: Identifying factors impacting profitability.
Methodology: Utilizes Linear Regression to determine variable weights. Specifically, it analyzes the relationship between discount rates and average profit to identify the break-even point.
Insight: Discount Rate is the strongest negative coefficient. Analysis shows that discounts above 20% often lead to losses.
File: src/Break-even point.py
Problem: Generic marketing strategies for all customers.
Methodology: Combines RFM (Recency, Frequency, Monetary) analysis with K-Means Clustering to segment customers.
Insight: Identifies distinct customer groups, such as:
- High-Value Loyalists: High profit, low price sensitivity.
- Discount Seekers: Primarily purchase unprofitable items (require discount policy adjustment).
- Average Customers and At-Risk Customers.
File: src/clustring.py
Problem: Lack of long-term customer value understanding.
Methodology: Implements Beta-Geo/NBD and Gamma-Gamma models to predict customer lifetime value and churn probability.
Insight: Provides insights into future profitability of customer segments and identifies customers at risk of churning.
File: src/CLV.py
Problem: Unoptimized product portfolio and lack of clear strategic insights.
Methodology: Develops advanced visualizations using Seaborn and Matplotlib for executive decision-making.
Output: Includes BCG Matrix for product categorization, a "Kill List" of low-margin products, and a market basket heatmap for cross-selling opportunities.
File: src/BCG and heatmap.py and src/matrix portfilio products.py
Technology products act as "Stars" (high growth, high profit), while furniture tables consume financial resources.

Identifies the top 10 products with the highest negative profit margins despite sales. Recommendations include discontinuing sales or increasing prices for these items.

Precise differentiation of loyal (green) from unprofitable (red) customers to optimize advertising budgets.

- Languages: Python
- Data Analysis: Pandas, NumPy
- Machine Learning: Scikit-Learn (K-Means, Linear Regression), Lifetimes (Beta-Geo/NBD, Gamma-Gamma)
- Statistics: SciPy (Hypothesis Testing)
- Visualization: Matplotlib, Seaborn
- Text Processing: Arabic-Reshaper, Python-Bidi (for Persian text rendering)
- Clone the repository:
git clone https://github.com/Mmadrb/Retail-Profit-Optimization.git cd Retail-Profit-Optimization - Install dependencies:
pip install -r requirements.txt
- Run the analysis scripts:
Each Python file in the
src/directory can be run independently to generate specific analyses and visualizations. For example:Output images will be saved in thepython src/BCG\ and\ heatmap.py python src/Break-even\ point.py python src/CLV.py python src/clustring.py python src/matrix\ portfilio\ products.py
Output/directory under their respective subfolders (e.g.,Output/executive_dashboard/).
This project is licensed under the MIT License - see the LICENSE file for details.
- Integrate all scripts into a single, cohesive main execution file.
- Develop a web-based dashboard (e.g., using Dash or Streamlit) for interactive exploration of insights.
- Implement more advanced machine learning models for demand forecasting and price optimization.
- Set up a robust CI/CD pipeline for automated testing and deployment.