Skip to content

A web scraping and data analysis tool designed to extract customer reviews from Best Buy product pages, analyze their sentiment using a hybrid approach, and visualize key customer insights.

Notifications You must be signed in to change notification settings

SidoJain/Web-Scraping-Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Best Buy Review Scraper & Sentiment Analyzer

This project is a web scraping and data analysis tool designed to extract customer reviews from Best Buy product pages, analyze their sentiment using a hybrid approach, and visualize key customer insights.

Features

  • Advanced Web Scraping: Uses undetected-chromedriver and Selenium to bypass anti-bot measures and handle dynamic content loading (infinite scrolling/pagination).
  • Data Extraction: Parses HTML using BeautifulSoup to extract:
    • Review Title & Body
    • Star Rating (1-5)
    • Date of Review
    • Reviewer Name & "Verified Buyer" Status
    • "Recommendation" Status (Yes/No)
  • Hybrid Sentiment Analysis: Calculates sentiment scores using a custom weighted algorithm:
    • NLTK VADER: Base sentiment scoring.
    • TextBlob: Noun phrase extraction for topic modeling.
    • Contextual Weighting: Adjusts sentiment scores based on the Star Rating and the user's "Recommended" flag.
  • Visualizations: Generates insightful charts using Matplotlib and Seaborn:
    • Overall Sentiment Distribution (Pie Chart).
    • Top Drivers of Sentiment (Bar Chart of key topics).
    • Average Rating comparison (Verified vs. Unverified Buyers).

Prerequisites

  • Python 3.11
  • Google Chrome Browser (Must be installed on the system for the webdriver to work).

Installation

  1. Clone the repository:

    git clone https://github.com/SidoJain/Web-Scraping-Sentiment-Analysis.git
  2. Install required Python packages: You can install the dependencies using the command below:

    uv pip install -r requirements.txt
  3. NLTK Data: The script automatically downloads the necessary NLTK lexicon (vader_lexicon) upon first run.

Usage

  1. Open the Jupyter Notebook (main.ipynb).

  2. Locate the main() function in the Driver Code cell.

  3. Update the target_url variable with the link to the Reviews Page of the Best Buy product you wish to analyze.

    • Note: Ensure the URL ends with /review or points specifically to the review section.
  4. Set Chrome Verion number as follows:

    driver = uc.Chrome(options=options, version_main={version_num})
  5. Run all cells in the notebook.

def main():
    # Example URL
    target_url = "https://www.bestbuy.ca/en-ca/product/apple-macbook-air-13-6-w-touch-id-2025-midnight-apple-m4-16gb-ram-256gb-ssd-english/19205139/review"
    # ... rest of the code

Methodology

  1. The Scraper The script launches a headless-like (but visible to avoid detection) Chrome instance. It:

    • Loads the page and removes cookie/privacy banners.
    • Applies the "Relevancy" filter.
    • Repeatedly clicks the "Load More" button with random time delays to mimic human behavior until all reviews are loaded.
  2. Sentiment Logic The analyze_sentiment function is more robust than standard library calls. It calculates a compound score based on:

    • Text Analysis: VADER polarity score.
    • Rating Bias: If the rating is >= 4, the score gets a bonus. If <= 2, it gets a penalty.
    • Recommendation Bias: If the user clicked "No" on "Would you recommend this?", the score is heavily penalized.
  3. Topic Extraction It uses TextBlob to extract Noun Phrases (e.g., "battery life", "screen quality") to identify what the user is talking about, assigning the sentiment score to that specific topic.

Disclaimer

This tool is for educational and research purposes only. Web scraping may violate the Terms of Service of specific websites. Please respect robots.txt files and scrape responsibly. Do not use this tool to overwhelm servers.

About

A web scraping and data analysis tool designed to extract customer reviews from Best Buy product pages, analyze their sentiment using a hybrid approach, and visualize key customer insights.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published