Skip to content

tejas-0-5/netflix-sql-analysis

Repository files navigation

Netflix Movies and TV Shows Data Analysis using SQL

Overview

This project presents an exploratory data analysis of Netflix Movies and TV Shows using SQL. The primary goal is to derive meaningful insights by solving real-world analytical business questions related to content distribution, ratings, geography, duration, and descriptive attributes.

The project demonstrates practical SQL skills including aggregation, string manipulation, window functions, filtering, and data transformation techniques.


Objectives

  • Analyze the distribution of content types (Movies vs TV Shows)
  • Identify the most common ratings for movies and TV shows
  • Examine content trends across release years and countries
  • Explore genre distribution and keyword-based categorization
  • Solve business-oriented analytical questions using SQL

Dataset

The dataset used in this analysis is publicly available on Kaggle:

Dataset Link: https://www.kaggle.com/datasets/shivamb/netflix-shows


Schema

DROP TABLE IF EXISTS netflix;
CREATE TABLE netflix
(
    show_id      VARCHAR(5),
    type         VARCHAR(10),
    title        VARCHAR(250),
    director     VARCHAR(550),
    casts        VARCHAR(1050),
    country      VARCHAR(550),
    date_added   VARCHAR(55),
    release_year INT,
    rating       VARCHAR(15),
    duration     VARCHAR(15),
    listed_in    VARCHAR(250),
    description  VARCHAR(550)
);

Business Problems and Solutions

1. Count the Number of Movies vs TV Shows

Determine the distribution of content types on Netflix.

2. Find the Most Common Rating for Movies and TV Shows

Identify the most frequently occurring rating for each type of content.

3. List All Movies Released in a Specific Year

Retrieve movies released in a chosen year.

4. Find the Top 5 Countries with the Most Content on Netflix

Analyze country-wise contribution to Netflix content.

5. Identify the Longest Movie

Detect the movie with the maximum duration.

6. Find Content Added in the Last 5 Years

Analyze recently added Netflix content.

7. Find All Movies/TV Shows by a Specific Director

Retrieve content directed by a given director.

8. List All TV Shows with More Than 5 Seasons

Identify long-running series.

9. Count the Number of Content Items in Each Genre

Perform genre-wise content distribution analysis.

10. Year-wise Indian Content Contribution

Analyze yearly content releases from India.

11. List All Movies that are Documentaries

Filter documentary content.

12. Find All Content Without a Director

Identify missing metadata records.

13. Actor Appearance Analysis

Analyze actor appearances over recent years.

14. Top Actors in Indian Productions

Identify actors with highest appearances in Indian content.

15. Keyword-based Content Categorization

Categorize content based on description keywords.


Key Insights

  • Netflix hosts a diverse mix of Movies and TV Shows across regions
  • Content ratings indicate varied audience targeting strategies
  • Certain countries dominate content production and availability
  • Genre analysis highlights platform content diversity
  • Keyword-based classification helps understand thematic patterns

Skills Demonstrated

  • SQL Query Writing
  • Aggregations & Grouping
  • Window Functions
  • String Manipulation
  • Data Cleaning & Transformation
  • Analytical Thinking
  • Business Problem Solving

Repository Contents

  • Dataset (CSV)
  • Schema creation script
  • Business problem statements
  • SQL solution queries
  • Project documentation

Note

This project is created for learning, practice, and portfolio demonstration purposes using publicly available datasets.


Author

Tejas

If you found this project useful or interesting, feel free to explore more repositories and connect.

About

SQL analysis of Netflix movies and TV shows dataset with business problem solutions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors