Deep-dive Exploratory Data Analysis on IPL Matches & Deliveries (2008–2022)
Personal Data Science Portfolio Project
A comprehensive Exploratory Data Analysis (EDA) on 15 seasons of IPL data (2008–2022), covering 950+ matches and 200,000+ ball-by-ball deliveries. This project uncovers hidden patterns, trends, and insights from one of the world's biggest cricket leagues using Python's data science stack.
- Understand the structure and quality of IPL match & delivery data
- Identify trends, patterns, and anomalies across 15 seasons
- Analyze team performance, player stats, toss impact & venue insights
- Visualize key findings with clear, informative charts
| # | Analysis | Key Insight |
|---|---|---|
| 1 | Missing Value Analysis | Identified & handled missing fields in both datasets |
| 2 | Team Win Analysis | Top 10 most successful teams across all seasons |
| 3 | Season-wise Trends | Match count growth from 2008 to 2022 |
| 4 | Toss Impact Analysis | Does winning toss help win the match? |
| 5 | Top Run Scorers | All-time leading batsmen by total runs |
| 6 | Top Wicket Takers | All-time leading bowlers by total wickets |
| 7 | Venue Analysis | Best performing venues across IPL history |
| 8 | Player of the Match | Most awarded players across seasons |
- 📊 Bar charts — Team wins, Top batsmen, Top bowlers
- 📈 Line + Area charts — Season-wise match trends
- 🥧 Pie charts — Toss decision distribution
- 🔥 Heatmaps — Correlation analysis
- 📉 Missing value visualizations
| Technology | Purpose |
|---|---|
| Python 3.8+ | Core programming language |
| Pandas | Data loading, cleaning & manipulation |
| NumPy | Numerical operations |
| Matplotlib | Base visualization library |
| Seaborn | Statistical visualizations |
| Warnings | Clean output management |
| File | Description | Size |
|---|---|---|
matches.csv |
Match-level data (2008–2022) | 950+ rows |
deliveries.csv |
Ball-by-ball delivery data | 200,000+ rows |
Dataset Source: Kaggle — IPL Complete Dataset
git clone https://github.com/rakesh4407/ipl-eda
cd ipl-eda
pip install pandas numpy matplotlib seaborn
# Update file paths in notebook
jupyter notebook IPL_EDA.ipynb1. Open IPL_EDA.ipynb in Google Colab
2. Upload matches.csv and deliveries.csv
3. Run all cells
- 🏆 Mumbai Indians leads all-time with the most IPL wins
- 🎲 Toss winners win the match ~52% of the time — slight edge
- 🏏 Virat Kohli is the all-time leading run scorer in IPL history
- 📅 IPL has grown from 8 teams to 10 teams over the years
- 🌍 Wankhede Stadium and Chinnaswamy are top scoring venues
Rakesh G
BCA (H) — Artificial Intelligence & Data Science
K.R. Mangalam University, New Delhi | CGPA: 9.22/10
Dean's Award Recipient | IBM Certified Data Scientist
python pandas eda exploratory-data-analysis ipl cricket data-science matplotlib seaborn data-analysis sports-analytics
⭐ If you found this useful, please star this repository!