Skip to content

AnishTatke/MovieRecommenderSystem

Repository files navigation

Movie Recommender System

A comprehensive movie recommendation system that combines collaborative filtering, content-based filtering, and hybrid approaches to provide personalized movie recommendations.

🚀 Features

Implemented Features

  • Collaborative Filtering:
    • ✅ ALS (Alternating Least Squares) with implicit library
    • ✅ SVD (Singular Value Decomposition) with Surprise library
    • ✅ Basic popularity-based recommendations
  • Content-Based Filtering:
    • ✅ TF-IDF features for movie descriptions
    • ✅ Text embeddings using sentence-transformers
  • Data Processing:
    • ✅ MovieLens dataset loading and preprocessing
    • ✅ Train/test splitting with time-based validation
    • ✅ ID mapping and data cleaning utilities
  • Evaluation Framework:
    • ✅ Regression metrics (RMSE, MAE)
    • ✅ Ranking metrics (Precision@K, Recall@K, NDCG@K, Hit Rate@K)
    • ✅ Top-K recommendation evaluation
  • API Framework:
    • ✅ FastAPI application structure
    • ✅ Basic endpoints for recommendations
    • ✅ Model loading and serving infrastructure
  • MLOps Integration:
    • ✅ MLflow experiment tracking setup
    • ✅ Hydra configuration management
  • Development Tools:
    • ✅ Jupyter notebooks for exploration and experimentation
    • ✅ Comprehensive evaluation notebooks

🔄 Upcoming Changes

  • Additional Collaborative Filtering Models:
    • BPR (Bayesian Personalized Ranking)
    • LightFM hybrid models
    • Neural collaborative filtering
  • Advanced Hybrid Approaches:
    • Weighted blending of multiple models
    • LightGBM-based hybrid models
    • Ensemble methods
  • Real-time API Features:
    • Redis caching implementation
    • User rating submission endpoints
    • Movie detail endpoints
  • Advanced Evaluation:
    • Cross-validation pipelines
    • A/B testing framework
    • Online evaluation metrics
  • Production Features:
    • Docker containerization
    • Celery task queue
    • Database integration
    • Monitoring and logging

🏗️ Architecture

src/
├── data/           # ✅ Data loaders and preprocessing
│   ├── loaders.py      # MovieDataLoader class
│   └── id_mapping.py   # ID mapping utilities
├── features/       # ✅ Feature extraction
│   ├── tfidf.py        # TF-IDF feature extraction
│   └── text_embeddings.py # Text embedding generation
├── cf/            # ✅ Collaborative filtering models
│   └── train_als.py    # ALS model training
├── hybrid/        # 🔄 Hybrid recommendation approaches
├── eval/          # ✅ Evaluation metrics and validation
│   └── metrics.py      # Regression and ranking metrics
├── serve/         # ✅ FastAPI application structure
│   ├── app.py         # Main API application
│   └── schemas.py     # Pydantic schemas
└── pipelines/     # 🔄 Training and inference pipelines

📋 Requirements

  • Python 3.9+
  • 8GB+ RAM recommended
  • GPU support (optional, for neural models)

🛠️ Installation

Option 1: Using Conda (Recommended)

# Clone the repository
git clone <your-repo-url>
cd MovieRecommenderSystem

# Create conda environment
conda env create -f environment.yml
conda activate movie-recommender

# Install in development mode
pip install -e .

Option 2: Manual Installation

# Clone the repository
git clone <your-repo-url>
cd MovieRecommenderSystem

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

🚀 Quick Start

1. Data Preparation

Place your movie data files in the data/raw/ directory:

  • movies.csv - Movie metadata (id, title, genres, description)
  • ratings.csv - User ratings (user_id, movie_id, rating, timestamp)
  • genome-scores.csv - Movie-tag relevance scores (optional)
  • genome-tags.csv - Tag definitions (optional)

2. Feature Extraction

# Extract text features
python src/features/text_embeddings.py --config conf/config.yaml

# Generate TF-IDF features
python src/features/tfidf.py --config conf/config.yaml

3. Train Models

# Train ALS collaborative filtering model
python src/cf/train_als.py --config conf/config.yaml

4. Start API Server

# Start the FastAPI server
uvicorn src.serve.app:app --host 0.0.0.0 --port 8000 --reload

5. Get Recommendations

# Example API call
curl -X POST "http://localhost:8000/recommend" \
     -H "Content-Type: application/json" \
     -d '{"user_id": 1, "n_recommendations": 10}'

📊 Model Training

Collaborative Filtering

from src.cf.train_als import ALSTrainer
from omegaconf import OmegaConf

# Load configuration
config = OmegaConf.load('conf/config.yaml')

# Train ALS model
trainer = ALSTrainer(config)
ratings_matrix, movies_df, id_mapper = trainer.load_data()
als_model = trainer.train(ratings_matrix)

Content-Based Features

from src.features.tfidf import TFIDFExtractor
from src.features.text_embeddings import TextEmbeddingExtractor

# Extract TF-IDF features
tfidf_extractor = TFIDFExtractor(config)
tfidf_matrix = tfidf_extractor.fit_transform(movies_df)

# Extract text embeddings
embedding_extractor = TextEmbeddingExtractor(config)
embeddings = embedding_extractor.fit_transform(movies_df)

🔧 Configuration

The system uses Hydra for configuration management. Key configuration files:

  • conf/config.yaml - Main configuration
  • conf/models.yaml - Model hyperparameters
  • conf/paths.yaml - File paths

Example configuration override:

python src/cf/train_als.py \
    --config conf/config.yaml \
    models.collaborative_filtering.als.factors=200 \
    models.collaborative_filtering.als.regularization=0.05

📈 Evaluation

Available Metrics

  • Regression Metrics: RMSE, MAE
  • Ranking Metrics: Precision@K, Recall@K, NDCG@K, Hit Rate@K

Usage

from src.eval.metrics import TopKMetrics, RegressionMetrics

# For rating prediction evaluation
reg_metrics = RegressionMetrics(y_true)
rmse, mae = reg_metrics.predict(y_pred)

# For top-K recommendation evaluation
topk_metrics = TopKMetrics(train, test)
# Use with recommendation DataFrame

📚 Jupyter Notebooks

The project includes comprehensive notebooks for exploration and experimentation:

  • notebooks/01_eda.ipynb - Exploratory data analysis of MovieLens dataset
  • notebooks/02_simple_baselines.ipynb - Implementation of baseline recommendation methods
  • notebooks/03_svd.ipynb - SVD model training and evaluation with Surprise library

📝 Development

Code Quality

# Install pre-commit hooks
pre-commit install

# Run all hooks
pre-commit run --all-files

# Format code
black src/
isort src/

Type Checking

# Run mypy
mypy src/

# Run with strict mode
mypy --strict src/

📚 API Documentation

Once the server is running, visit:

Available Endpoints

  • POST /recommend - Get personalized recommendations
  • GET /health - Health check endpoint

🔍 Monitoring

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

📞 Support

For questions and support:

  • Create an issue on GitHub
  • Check the documentation
  • Review the example notebooks in the notebooks/ directory

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors