NLP-based Sentiment Analysis on IPL Match Commentary using VADER & TextBlob
Personal Data Science Portfolio Project
This project applies Natural Language Processing (NLP) techniques to analyze the sentiment of IPL match commentary — classifying each commentary line as Positive, Negative, or Neutral. Using both VADER and TextBlob models, the project reveals emotional patterns across teams, overs, and match events.
- Clean and preprocess real IPL commentary text data
- Classify commentary sentiment as Positive / Negative / Neutral
- Analyze sentiment trends by over, team, and match events
- Visualize insights with WordClouds and sentiment charts
| Model | Type | Best For |
|---|---|---|
| VADER | Rule-based NLP | Social media & sports text — handles exclamations, caps |
| TextBlob | Lexicon-based NLP | Polarity (-1 to +1) & Subjectivity (0 to 1) scores |
| # | Analysis | Description |
|---|---|---|
| 1 | Text Cleaning | Lowercasing, special char removal, whitespace handling |
| 2 | VADER Sentiment | Compound score → Positive/Negative/Neutral classification |
| 3 | TextBlob Analysis | Polarity & Subjectivity scoring per commentary |
| 4 | Sentiment Distribution | Overall % breakdown across all commentary |
| 5 | Team-wise Sentiment | Which teams generate most positive commentary |
| 6 | Over-wise Trends | Sentiment patterns across different match phases |
| 7 | WordCloud | Most frequent words in Positive vs Negative commentary |
| 8 | Model Comparison | VADER vs TextBlob agreement analysis |
- 🥧 Pie Chart — Sentiment distribution (Positive/Negative/Neutral %)
- 📊 Bar Charts — Team-wise and over-wise sentiment breakdown
- ☁️ WordClouds — Positive words vs Negative words
- 📈 Line Charts — Sentiment trend across match overs
- 🔥 Heatmap — Sentiment correlation matrix
| Technology | Purpose |
|---|---|
| Python 3.8+ | Core programming language |
| Pandas | Data loading & manipulation |
| VADER Sentiment | Primary sentiment classifier |
| TextBlob | Secondary NLP analysis |
| WordCloud | Word frequency visualization |
| Matplotlib | Base visualization library |
| Seaborn | Statistical visualizations |
| re (regex) | Text cleaning & preprocessing |
pip install vaderSentiment textblob wordcloud pandas matplotlib seaborn
python -m textblob.download_corpora| File | Description |
|---|---|
IPL_Match_Highlights_Commentary.csv |
IPL commentary with Team, Over, Score columns |
Columns used:
Commentary— Raw match commentary textTeam— Team namescore— Ball outcome (4, 6, W, dot etc.)
1. Open IPL_SentimentAnalysis.ipynb in Google Colab
2. Upload IPL_Match_Highlights_Commentary.csv to Google Drive
3. Update file path in Step 3
4. Run all cells
git clone https://github.com/rakesh4407/ipl-sentiment-analysis
cd ipl-sentiment-analysis
pip install -r requirements.txt
jupyter notebook IPL_SentimentAnalysis.ipynb- 🟢 IPL commentary is predominantly Positive — reflecting exciting gameplay
- 🔴 Wicket deliveries generate highest Negative sentiment scores
- 🟡 Dot balls trend toward Neutral sentiment
- 🏏 Boundary (4s & 6s) commentary scores highest Positive compound
- 🎯 VADER outperforms TextBlob for sports commentary analysis
Raw Commentary Text
↓
Text Cleaning (lowercase, remove special chars)
↓
VADER Analysis → Compound Score → Sentiment Label
↓
TextBlob Analysis → Polarity + Subjectivity
↓
Visualization (Charts, WordClouds, Heatmaps)
↓
Insights & Conclusions
Rakesh G
BCA (H) — Artificial Intelligence & Data Science
K.R. Mangalam University, New Delhi | CGPA: 9.22/10
Dean's Award Recipient | IBM Certified Data Scientist
python nlp sentiment-analysis vader textblob ipl cricket wordcloud text-analysis data-science machine-learning sports-analytics
⭐ If you found this useful, please star this repository!