Skip to content

AdhamKhouly/sql-analytics-case-studies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQL Analytics Case Studies

PostgreSQL SQL pgAdmin License

A polished collection of PostgreSQL analytics case studies built around realistic business and reporting requests.

This repository demonstrates how SQL can be used to translate stakeholder questions into clean, accurate, and business-ready analytical outputs. The project covers data quality checks, product analysis, customer segmentation, sales reporting, academic reporting, ranking, recursive CTEs, date logic, and window functions.

Each case study includes a business problem, a SQL solution, and a clear explanation of the logic behind the query.


Project Overview

This project is designed as a SQL analytics portfolio.

It shows the full analytical process:

  1. Understand the stakeholder request
  2. Identify the relevant tables
  3. Translate the business question into SQL logic
  4. Write a clean query
  5. Explain the approach clearly
  6. Produce an output that supports decision-making

The repository is structured to show not just SQL syntax, but practical business problem-solving.


Key Capabilities

📊 Business Analytics Queries
Solves realistic stakeholder requests involving sales, products, customers, orders, and student records.



🧩 Advanced SQL Logic
Uses joins, CTEs, recursive CTEs, window functions, ranking functions, date logic, and conditional aggregation.



🧹 Data Quality Checks
Identifies missing records, duplicate records, and mismatches between related tables.



👥 Customer Segmentation
Analyzes top customers, customer spending brackets, and customer order behavior over time.



📈 Sales Trend Reporting
Calculates monthly sales, cumulative sales, and six-month moving averages.



📝 Clear Documentation
Every case study includes the business problem, SQL solution, and step-by-step explanation.


Case Study Highlights

Data Quality Check: Missing Products

Identifies products that exist in one table but not another using a FULL JOIN.

This type of query helps detect data integrity issues before they affect dashboards, sales reports, or revenue calculations.

queries/01_data_quality_missing_products/

Similar Product Prices

Compares products against each other to find candy pairs within $0.25 of each other in unit price.

This demonstrates self joins and pairwise comparison logic.

queries/02_similar_product_prices/

Customer Transaction Number

Adds a sequential transaction number for each customer using ROW_NUMBER().

This is useful for customer journey analysis, repeat purchase tracking, and retention reporting.

queries/07_customer_transaction_number/

Customer Spending Brackets

Generates dynamic $10 spending brackets using a recursive CTE.

This helps segment customers based on total spending.

queries/12_customer_spending_brackets/

Monthly Sales Moving Average

Creates a monthly sales report with total sales, cumulative sales, and a six-month moving average.

This demonstrates time-series reporting using window functions.

queries/17_monthly_sales_moving_average/

Case Studies Included

# Case Study Main Concepts
01 Data Quality Check: Missing Products FULL JOIN, NULL filtering, data reconciliation
02 Similar Product Prices Self join, ABS(), pairwise comparison
03 Price Deviation From Average AVG() OVER, calculated columns
04 Products by Factory COUNT() OVER, PARTITION BY
05 Products Cheaper Than Wicked Choccy's CTE, MIN(), comparison filtering
06 Large Orders Over $200 Joins, aggregation, CTEs, COUNT() OVER
07 Customer Transaction Number ROW_NUMBER(), customer-level sequencing
08 Product Popularity Rank DENSE_RANK(), ranking within orders
09 Second Most Popular Product CTE, ranking, filtering ranked results
10 Customer Order Change Over Time LAG(), order-level aggregation
11 Top 1 Percent Customers NTILE(100), customer segmentation
12 Customer Spending Brackets Recursive CTE, range bucketing
13 Q2 2024 Orders With Ship Date Date filtering, INTERVAL
14 Student Duplicate Records ROW_NUMBER(), duplicate removal
15 Student Highest Grade DENSE_RANK(), top record per group
16 Department Grade Summary CASE, conditional aggregation, pivot-style summary
17 Monthly Sales Moving Average Time-series aggregation, cumulative sum, moving average

SQL Skills Demonstrated

Joins

The project uses multiple join types to connect and compare datasets.

  • INNER JOIN
  • LEFT JOIN
  • FULL JOIN
  • Self joins
  • CROSS JOIN

Examples include finding missing products, calculating order values, comparing product prices, and joining student records to grades.

Detailed notes are available in:

concepts/joins.md

Common Table Expressions

CTEs are used to break complex logic into clear steps.

They help make the queries easier to read, explain, and maintain.

Examples include calculating order totals, ranking customers, filtering ranked records, and preparing monthly sales data.

Detailed notes are available in:

concepts/ctes.md

Window Functions

Window functions are used throughout the project to calculate rankings, sequence numbers, previous values, group counts, cumulative totals, and moving averages.

Functions used include:

  • ROW_NUMBER()
  • DENSE_RANK()
  • NTILE()
  • LAG()
  • COUNT() OVER
  • AVG() OVER
  • SUM() OVER

Detailed notes are available in:

concepts/window_functions.md

Recursive CTEs

Recursive CTEs are used to generate dynamic customer spending brackets.

This avoids hard-coding ranges and makes the query flexible as the data changes.

Detailed notes are available in:

concepts/recursive_ctes.md

Date Functions

Date logic is used for quarterly filtering, calculated shipping dates, customer order sequencing, and monthly sales reporting.

Concepts include:

  • Date ranges
  • INTERVAL
  • EXTRACT()
  • TO_CHAR()
  • DATE_TRUNC()

Detailed notes are available in:

concepts/date_functions.md

Ranking Functions

Ranking functions are used to identify top records, second-ranked records, duplicate records, and customer percentile groups.

Functions covered include:

  • ROW_NUMBER()
  • RANK()
  • DENSE_RANK()
  • NTILE()

Detailed notes are available in:

concepts/ranking_functions.md

Business Problems Solved

Data Quality

The repository includes queries that identify missing and duplicate records.

Examples:

  • Products that exist in one table but not another
  • Duplicate student records

These queries demonstrate how SQL can improve data reliability before reporting.


Product Analysis

The project includes queries that help evaluate pricing, popularity, and production distribution.

Examples:

  • Products with similar prices
  • Products above or below average price
  • Products produced by each factory
  • Products ranked by popularity within each order

These queries support pricing reviews, inventory planning, product bundling, and product strategy.


Sales Analysis

The project includes sales-focused queries that calculate order values, customer spending, and sales trends.

Examples:

  • Orders over $200
  • Monthly sales totals
  • Cumulative sales
  • Six-month moving average of sales

These queries support sales reporting, executive dashboards, and business performance reviews.


Customer Analytics

The project includes customer-focused queries that analyze spending and order behavior.

Examples:

  • Customer transaction numbering
  • Customer order changes over time
  • Top 1% of customers by spending
  • Customer spending brackets

These queries support customer segmentation, retention analysis, and loyalty program targeting.


Academic Reporting

The project also applies SQL analytics patterns to student records.

Examples:

  • Removing duplicate student records
  • Finding each student's highest grade
  • Summarizing average grades by department and grade level

These examples show that the SQL patterns are transferable across different domains.


Repository Structure

sql-analytics-case-studies/
│
├── README.md
├── LICENSE
├── .gitignore
├── portfolio_summary.md
│
├── data/
│   ├── README.md
│   ├── data_dictionary.md
│   └── sample_data_note.md
│
├── schema/
│   ├── README.md
│   ├── create_tables.sql
│   └── entity_relationship_notes.md
│
├── queries/
│   │
│   ├── 01_data_quality_missing_products/
│   │   ├── problem.md
│   │   ├── solution.sql
│   │   └── explanation.md
│   │
│   ├── 02_similar_product_prices/
│   │   ├── problem.md
│   │   ├── solution.sql
│   │   └── explanation.md
│   │
│   ├── 03_price_deviation_from_average/
│   │   ├── problem.md
│   │   ├── solution.sql
│   │   └── explanation.md
│   │
│   ├── 04_products_by_factory/
│   │   ├── problem.md
│   │   ├── solution.sql
│   │   └── explanation.md
│   │
│   ├── 05_products_cheaper_than_wicked_choccy/
│   │   ├── problem.md
│   │   ├── solution.sql
│   │   └── explanation.md
│   │
│   ├── 06_large_orders_over_200/
│   │   ├── problem.md
│   │   ├── solution.sql
│   │   └── explanation.md
│   │
│   ├── 07_customer_transaction_number/
│   │   ├── problem.md
│   │   ├── solution.sql
│   │   └── explanation.md
│   │
│   ├── 08_product_popularity_rank/
│   │   ├── problem.md
│   │   ├── solution.sql
│   │   └── explanation.md
│   │
│   ├── 09_second_most_popular_product/
│   │   ├── problem.md
│   │   ├── solution.sql
│   │   └── explanation.md
│   │
│   ├── 10_customer_order_change_over_time/
│   │   ├── problem.md
│   │   ├── solution.sql
│   │   └── explanation.md
│   │
│   ├── 11_top_1_percent_customers/
│   │   ├── problem.md
│   │   ├── solution.sql
│   │   └── explanation.md
│   │
│   ├── 12_customer_spending_brackets/
│   │   ├── problem.md
│   │   ├── solution.sql
│   │   └── explanation.md
│   │
│   ├── 13_q2_2024_orders_with_ship_date/
│   │   ├── problem.md
│   │   ├── solution.sql
│   │   └── explanation.md
│   │
│   ├── 14_student_duplicate_records/
│   │   ├── problem.md
│   │   ├── solution.sql
│   │   └── explanation.md
│   │
│   ├── 15_student_highest_grade/
│   │   ├── problem.md
│   │   ├── solution.sql
│   │   └── explanation.md
│   │
│   ├── 16_department_grade_summary/
│   │   ├── problem.md
│   │   ├── solution.sql
│   │   └── explanation.md
│   │
│   └── 17_monthly_sales_moving_average/
│       ├── problem.md
│       ├── solution.sql
│       └── explanation.md
│
└── concepts/
    ├── joins.md
    ├── ctes.md
    ├── window_functions.md
    ├── recursive_ctes.md
    ├── date_functions.md
    └── ranking_functions.md

Data and Schema

The repository includes documentation for the tables used in the case studies.

The main subject areas are:

Candy Business Dataset

Used for product, order, customer, factory, pricing, and sales analysis.

Main tables:

  • products
  • orders

Student Records Dataset

Used for academic reporting and student performance analysis.

Main tables:

  • students
  • student_grades

Data documentation is available in:

data/data_dictionary.md

Schema documentation is available in:

schema/entity_relationship_notes.md

A simplified mock schema is available in:

schema/create_tables.sql

Important Data Note

The original learning exercises were completed using a PostgreSQL database in pgAdmin 4.

The original schema and dataset may come from a course-provided learning environment.

To respect data ownership, this repository focuses on:

  • Original SQL solutions
  • Query explanations
  • Business logic
  • Documentation
  • A simplified mock schema for practice

This repository should not redistribute proprietary course data unless permission is explicitly granted by the course provider.


How to Use This Repository

1. Review the Case Studies

Each case study is located in the queries/ folder.

Every case study includes:

problem.md
solution.sql
explanation.md

2. Read the Problem

Start with problem.md to understand the business request.

3. Review the SQL Solution

Open solution.sql to see the final query.

4. Read the Explanation

Use explanation.md to understand the logic step by step.

5. Explore the Concepts

Use the concepts/ folder to review the SQL techniques used across the project.


Example Case Study Structure

queries/09_second_most_popular_product/
├── problem.md
├── solution.sql
└── explanation.md

The problem.md file explains the stakeholder request.

The solution.sql file contains the SQL query.

The explanation.md file breaks down the logic step by step.


Technology Stack

  • PostgreSQL
  • pgAdmin 4
  • SQL
  • GitHub

Suitable For

This project is relevant for roles such as:

  • Data Analyst
  • Business Analyst
  • Business Intelligence Analyst
  • Analytics Engineer
  • Reporting Analyst
  • Data Science Intern
  • Business Analytics Graduate Student

Future Improvements

Possible future improvements include:

  • Creating a fully original mock dataset
  • Adding sample query outputs
  • Adding screenshots from pgAdmin
  • Adding an entity relationship diagram
  • Adding performance notes for selected queries
  • Creating a dashboard from selected outputs
  • Adding multi-year-safe versions of time-series queries
  • Adding SQL validation checks for the mock schema

License

This project is licensed under the MIT License.

See LICENSE file for details.

Author

Adham Elkhouly

  • Data Analytics and Reporting Specialist
  • Power Platform Functional Consultant Associate

Developed as a SQL analytics portfolio project to demonstrate practical data analysis, business reporting, and stakeholder-focused query writing using PostgreSQL.

About

PostgreSQL business analytics case studies using joins, CTEs, window functions, recursive queries, ranking, and moving averages.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors