This project demonstrates how to integrate Python and SQL for creating, storing, and analyzing sales data. It involves generating synthetic data, performing SQL queries for actionable insights, and conducting Exploratory Data Analysis (EDA) with Python to reveal patterns and trends through visualizations.
-
Task 1: Dataset Creation Generate a dataset with the following attributes:
- Customer ID: Unique IDs from 1001 to 1200.
- Customer Name: Random names generated using
Faker. - Product ID: IDs from 1 to 20.
- Purchase Date: Random dates from the last year.
- Quantity: Random values between 1 and 10.
- Price per Unit: Prices ranging from 10.00 to 1000.00.
- Region: Randomly assigned as "North," "South," "East," or "West."
-
Task 2: Insert Data into SQL
- Define an SQL table schema matching the dataset attributes.
- Populate the SQL database using Python.
Perform the following queries:
- Total Sales by Region: Calculate
quantity * price_per_unitper region. - Top Products: Retrieve the top 5 products by total sales.
- Monthly Sales: Calculate monthly total sales.
- Customer Analysis: Find the total amount spent by each customer.
- Regional Product Sales: Show product-wise sales for each region.
-
Retrieve Data to Python:
- Use libraries like
pandasandsqlite3/mysql-connectorto load data into a DataFrame.
- Use libraries like
-
EDA Tasks:
- Summary Statistics: Compute mean, median, max, min, and standard deviation for
quantityandprice_per_unit. - Sales by Region: Summarize total sales for each region.
- Top Customers: Identify the 5 highest-spending customers.
- Summary Statistics: Compute mean, median, max, min, and standard deviation for
-
Visualizations:
- Bar Plot: Total sales per region.
- Pie Chart: Proportions of sales by product.
- Line Plot: Monthly sales trends.
- Scatter Plot: Relationship between quantity and price per unit.
- Python: For data generation, analysis, and visualization.
pandasfor data manipulation.numpyfor numerical operations.Fakerfor synthetic data generation.matplotlibandseabornfor plotting.
- SQL: SQLite/MySQL for data storage and querying.
SQLAlchemyfor database interaction.
- Install necessary Python libraries:
pip install pandas numpy faker sqlalchemy pymysql matplotlib seaborn
- Run the dataset generation script to populate the SQL database.
- Execute the SQL script for queries.
- Load the data back into Python for analysis and visualization.
- Regional Trends: Certain regions dominate sales, providing guidance for resource allocation.
- Top Products: High-demand products drive revenue; focus marketing on these items.
- Seasonality: Monthly trends highlight peak sales periods.
- Pricing Patterns: Analyzing quantity vs. price helps understand customer purchasing behavior.
- Python scripts for dataset generation and SQL integration.
- SQL script for queries.
- Jupyter notebook or Python script for EDA and visualizations.
- A detailed README for project setup and usage instructions.
[Manas Jadhav]
For any questions or feedback, feel free to reach out!



