This project involves a comprehensive Exploratory Data Analysis (EDA) of the Netflix titles dataset from Kaggle, utilizing Python and its core data science libraries (Pandas, NumPy, and Matplotlib). The primary goal is to understand Netflix's content strategy, analyze the distribution of movies vs. TV shows, identify trends in content addition, and explore the diversity of genres and country-specific content.
- Language:
Python 3.x - Data Analysis:
Pandas(For data cleaning, processing, and manipulation) - Numerical Computing:
NumPy(For numerical operations) - Data Visualization:
Matplotlib(For plotting insights derived from the analysis) - Environment:
Jupyter Notebook
- Data Loading: The
netflix_titles.csvfile from Kaggle was loaded into a Pandas DataFrame. - Data Cleaning:
- Handled missing values (
NaN) in columns likedirectorandcast. - Converted the
date_addedcolumn to the correctdatetimeformat. - Dropped irrelevant columns and prepared the dataset for analysis.
- Handled missing values (
- Exploratory Data Analysis (EDA): Used Pandas to ask and answer key business questions.
- Visualization: Used Matplotlib to create plots and charts to present the findings for each question.
This analysis yielded several key insights into the Netflix library:
The Netflix library is predominantly composed of Movies, which significantly outnumber TV Shows.
There has been a dramatic increase in content added to Netflix over the years, with a significant boom observed post-2016.
Analysis of the top 10 countries that produce or host the most content on the platform. (e.g., United States, India, UK).
Analysis the percentage breakdown of all content in the Netflix library by its official maturity rating.
To run this analysis yourself:
- Clone this repository to your local machine.
- Install the required libraries:
pip install pandas numpy matplotlib
- Open the
netflix_analysis.ipynb(or your notebook's file name) in Jupyter Notebook and run the cells.
netflix_analysis.ipynb: The main Jupyter Notebook file containing all the Python code and analysis.netflix_titles.csv: The original raw dataset used for this project (sourced from Kaggle)./images(Folder): Contains the.pngchart files saved from the notebook (used to display visuals in this README).



