Skip to content

harshareddy832/movie_recommender

Repository files navigation

Movie Recommender System

A scalable movie recommendation system using Apache Spark and ALS.

Quick Start

Local Mode

# Run with 5% sample size
python run_local_test.py --sample_size small

# Run with 10% sample size
python run_local_test.py --sample_size medium

# Run with 25% sample size
python run_local_test.py --sample_size large

GCP Cluster Mode

# Run with 25% sample size
python run_on_gcp.py --sample_size small

# Run with 50% sample size
python run_on_gcp.py --sample_size medium

# Run with 100% sample size
python run_on_gcp.py --sample_size large

Requirements

  • Python 3.8+
  • Apache Spark 3.2+
  • PySpark
  • NumPy
  • Pandas
  • Matplotlib
  • Seaborn

Data

Place MovieLens 25M dataset in data/ml-25m/:

  • ratings.csv
  • movies.csv

Configuration

  • Local mode: 4GB driver, 6GB executor memory
  • Cluster mode: 4GB driver, 8GB executor memory
  • ALS parameters in config.py

Output

  • Recommendations saved to output/recommendations/
  • Performance metrics in output/metrics/
  • Visualizations in output/plots/

About

using ALS and Pyspark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages