Skip to content

rakesh4407/IPL_exploratory-data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🏏 IPL Exploratory Data Analysis (EDA)

Deep-dive Exploratory Data Analysis on IPL Matches & Deliveries (2008–2022)
Personal Data Science Portfolio Project

Python Pandas Seaborn Matplotlib Status


📌 About the Project

A comprehensive Exploratory Data Analysis (EDA) on 15 seasons of IPL data (2008–2022), covering 950+ matches and 200,000+ ball-by-ball deliveries. This project uncovers hidden patterns, trends, and insights from one of the world's biggest cricket leagues using Python's data science stack.


🎯 Objectives

  • Understand the structure and quality of IPL match & delivery data
  • Identify trends, patterns, and anomalies across 15 seasons
  • Analyze team performance, player stats, toss impact & venue insights
  • Visualize key findings with clear, informative charts

🔍 Analysis Performed

# Analysis Key Insight
1 Missing Value Analysis Identified & handled missing fields in both datasets
2 Team Win Analysis Top 10 most successful teams across all seasons
3 Season-wise Trends Match count growth from 2008 to 2022
4 Toss Impact Analysis Does winning toss help win the match?
5 Top Run Scorers All-time leading batsmen by total runs
6 Top Wicket Takers All-time leading bowlers by total wickets
7 Venue Analysis Best performing venues across IPL history
8 Player of the Match Most awarded players across seasons

📊 Charts & Visualizations

  • 📊 Bar charts — Team wins, Top batsmen, Top bowlers
  • 📈 Line + Area charts — Season-wise match trends
  • 🥧 Pie charts — Toss decision distribution
  • 🔥 Heatmaps — Correlation analysis
  • 📉 Missing value visualizations

🛠️ Tech Stack

Technology Purpose
Python 3.8+ Core programming language
Pandas Data loading, cleaning & manipulation
NumPy Numerical operations
Matplotlib Base visualization library
Seaborn Statistical visualizations
Warnings Clean output management

📁 Dataset

File Description Size
matches.csv Match-level data (2008–2022) 950+ rows
deliveries.csv Ball-by-ball delivery data 200,000+ rows

Dataset Source: Kaggle — IPL Complete Dataset


🚀 How to Run

Option A — VS Code / Local

git clone https://github.com/rakesh4407/ipl-eda
cd ipl-eda
pip install pandas numpy matplotlib seaborn
# Update file paths in notebook
jupyter notebook IPL_EDA.ipynb

Option B — Google Colab

1. Open IPL_EDA.ipynb in Google Colab
2. Upload matches.csv and deliveries.csv
3. Run all cells

💡 Key Insights Found

  • 🏆 Mumbai Indians leads all-time with the most IPL wins
  • 🎲 Toss winners win the match ~52% of the time — slight edge
  • 🏏 Virat Kohli is the all-time leading run scorer in IPL history
  • 📅 IPL has grown from 8 teams to 10 teams over the years
  • 🌍 Wankhede Stadium and Chinnaswamy are top scoring venues

👨‍💻 Author

Rakesh G

BCA (H) — Artificial Intelligence & Data Science
K.R. Mangalam University, New Delhi | CGPA: 9.22/10
Dean's Award Recipient | IBM Certified Data Scientist

LinkedIn GitHub Email


🏷️ Topics

python pandas eda exploratory-data-analysis ipl cricket data-science matplotlib seaborn data-analysis sports-analytics


If you found this useful, please star this repository!

About

Deep-dive Exploratory Data Analysis on IPL Matches & Deliveries (2008–2022) Personal Data Science Portfolio Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors