Movie Recommender System

A comprehensive movie recommendation system that combines collaborative filtering, content-based filtering, and hybrid approaches to provide personalized movie recommendations.

🚀 Features

✅ Implemented Features

Collaborative Filtering:
- ✅ ALS (Alternating Least Squares) with implicit library
- ✅ SVD (Singular Value Decomposition) with Surprise library
- ✅ Basic popularity-based recommendations
Content-Based Filtering:
- ✅ TF-IDF features for movie descriptions
- ✅ Text embeddings using sentence-transformers
Data Processing:
- ✅ MovieLens dataset loading and preprocessing
- ✅ Train/test splitting with time-based validation
- ✅ ID mapping and data cleaning utilities
Evaluation Framework:
- ✅ Regression metrics (RMSE, MAE)
- ✅ Ranking metrics (Precision@K, Recall@K, NDCG@K, Hit Rate@K)
- ✅ Top-K recommendation evaluation
API Framework:
- ✅ FastAPI application structure
- ✅ Basic endpoints for recommendations
- ✅ Model loading and serving infrastructure
MLOps Integration:
- ✅ MLflow experiment tracking setup
- ✅ Hydra configuration management
Development Tools:
- ✅ Jupyter notebooks for exploration and experimentation
- ✅ Comprehensive evaluation notebooks

🔄 Upcoming Changes

Additional Collaborative Filtering Models:
- BPR (Bayesian Personalized Ranking)
- LightFM hybrid models
- Neural collaborative filtering
Advanced Hybrid Approaches:
- Weighted blending of multiple models
- LightGBM-based hybrid models
- Ensemble methods
Real-time API Features:
- Redis caching implementation
- User rating submission endpoints
- Movie detail endpoints
Advanced Evaluation:
- Cross-validation pipelines
- A/B testing framework
- Online evaluation metrics
Production Features:
- Docker containerization
- Celery task queue
- Database integration
- Monitoring and logging

🏗️ Architecture

src/
├── data/           # ✅ Data loaders and preprocessing
│   ├── loaders.py      # MovieDataLoader class
│   └── id_mapping.py   # ID mapping utilities
├── features/       # ✅ Feature extraction
│   ├── tfidf.py        # TF-IDF feature extraction
│   └── text_embeddings.py # Text embedding generation
├── cf/            # ✅ Collaborative filtering models
│   └── train_als.py    # ALS model training
├── hybrid/        # 🔄 Hybrid recommendation approaches
├── eval/          # ✅ Evaluation metrics and validation
│   └── metrics.py      # Regression and ranking metrics
├── serve/         # ✅ FastAPI application structure
│   ├── app.py         # Main API application
│   └── schemas.py     # Pydantic schemas
└── pipelines/     # 🔄 Training and inference pipelines

📋 Requirements

Python 3.9+
8GB+ RAM recommended
GPU support (optional, for neural models)

🛠️ Installation

Option 1: Using Conda (Recommended)

# Clone the repository
git clone <your-repo-url>
cd MovieRecommenderSystem

# Create conda environment
conda env create -f environment.yml
conda activate movie-recommender

# Install in development mode
pip install -e .

Option 2: Manual Installation

# Clone the repository
git clone <your-repo-url>
cd MovieRecommenderSystem

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

🚀 Quick Start

1. Data Preparation

Place your movie data files in the data/raw/ directory:

movies.csv - Movie metadata (id, title, genres, description)
ratings.csv - User ratings (user_id, movie_id, rating, timestamp)
genome-scores.csv - Movie-tag relevance scores (optional)
genome-tags.csv - Tag definitions (optional)

2. Feature Extraction

# Extract text features
python src/features/text_embeddings.py --config conf/config.yaml

# Generate TF-IDF features
python src/features/tfidf.py --config conf/config.yaml

3. Train Models

# Train ALS collaborative filtering model
python src/cf/train_als.py --config conf/config.yaml

4. Start API Server

# Start the FastAPI server
uvicorn src.serve.app:app --host 0.0.0.0 --port 8000 --reload

5. Get Recommendations

# Example API call
curl -X POST "http://localhost:8000/recommend" \
     -H "Content-Type: application/json" \
     -d '{"user_id": 1, "n_recommendations": 10}'

📊 Model Training

Collaborative Filtering

from src.cf.train_als import ALSTrainer
from omegaconf import OmegaConf

# Load configuration
config = OmegaConf.load('conf/config.yaml')

# Train ALS model
trainer = ALSTrainer(config)
ratings_matrix, movies_df, id_mapper = trainer.load_data()
als_model = trainer.train(ratings_matrix)

Content-Based Features

from src.features.tfidf import TFIDFExtractor
from src.features.text_embeddings import TextEmbeddingExtractor

# Extract TF-IDF features
tfidf_extractor = TFIDFExtractor(config)
tfidf_matrix = tfidf_extractor.fit_transform(movies_df)

# Extract text embeddings
embedding_extractor = TextEmbeddingExtractor(config)
embeddings = embedding_extractor.fit_transform(movies_df)

🔧 Configuration

The system uses Hydra for configuration management. Key configuration files:

conf/config.yaml - Main configuration
conf/models.yaml - Model hyperparameters
conf/paths.yaml - File paths

Example configuration override:

python src/cf/train_als.py \
    --config conf/config.yaml \
    models.collaborative_filtering.als.factors=200 \
    models.collaborative_filtering.als.regularization=0.05

📈 Evaluation

Available Metrics

Regression Metrics: RMSE, MAE
Ranking Metrics: Precision@K, Recall@K, NDCG@K, Hit Rate@K

Usage

from src.eval.metrics import TopKMetrics, RegressionMetrics

# For rating prediction evaluation
reg_metrics = RegressionMetrics(y_true)
rmse, mae = reg_metrics.predict(y_pred)

# For top-K recommendation evaluation
topk_metrics = TopKMetrics(train, test)
# Use with recommendation DataFrame

📚 Jupyter Notebooks

The project includes comprehensive notebooks for exploration and experimentation:

notebooks/01_eda.ipynb - Exploratory data analysis of MovieLens dataset
notebooks/02_simple_baselines.ipynb - Implementation of baseline recommendation methods
notebooks/03_svd.ipynb - SVD model training and evaluation with Surprise library

📝 Development

Code Quality

# Install pre-commit hooks
pre-commit install

# Run all hooks
pre-commit run --all-files

# Format code
black src/
isort src/

Type Checking

# Run mypy
mypy src/

# Run with strict mode
mypy --strict src/

📚 API Documentation

Once the server is running, visit:

Interactive API docs: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc
OpenAPI schema: http://localhost:8000/openapi.json

Available Endpoints

POST /recommend - Get personalized recommendations
GET /health - Health check endpoint

🔍 Monitoring

MLflow: http://localhost:5000 (experiment tracking)

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

MovieLens for the dataset
Implicit for ALS implementation
Surprise for SVD and other algorithms
FastAPI for the web framework
MLflow for ML lifecycle management

📞 Support

For questions and support:

Create an issue on GitHub
Check the documentation
Review the example notebooks in the notebooks/ directory

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
conf		conf
data		data
docker		docker
features		features
models		models
notebooks		notebooks
reports		reports
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
dvc.yaml		dvc.yaml
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Movie Recommender System

🚀 Features

✅ Implemented Features

🔄 Upcoming Changes

🏗️ Architecture

📋 Requirements

🛠️ Installation

Option 1: Using Conda (Recommended)

Option 2: Manual Installation

🚀 Quick Start

1. Data Preparation

2. Feature Extraction

3. Train Models

4. Start API Server

5. Get Recommendations

📊 Model Training

Collaborative Filtering

Content-Based Features

🔧 Configuration

📈 Evaluation

Available Metrics

Usage

📚 Jupyter Notebooks

📝 Development

Code Quality

Type Checking

📚 API Documentation

Available Endpoints

🔍 Monitoring

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages