A comprehensive movie recommendation system that combines collaborative filtering, content-based filtering, and hybrid approaches to provide personalized movie recommendations.
- Collaborative Filtering:
- ✅ ALS (Alternating Least Squares) with implicit library
- ✅ SVD (Singular Value Decomposition) with Surprise library
- ✅ Basic popularity-based recommendations
- Content-Based Filtering:
- ✅ TF-IDF features for movie descriptions
- ✅ Text embeddings using sentence-transformers
- Data Processing:
- ✅ MovieLens dataset loading and preprocessing
- ✅ Train/test splitting with time-based validation
- ✅ ID mapping and data cleaning utilities
- Evaluation Framework:
- ✅ Regression metrics (RMSE, MAE)
- ✅ Ranking metrics (Precision@K, Recall@K, NDCG@K, Hit Rate@K)
- ✅ Top-K recommendation evaluation
- API Framework:
- ✅ FastAPI application structure
- ✅ Basic endpoints for recommendations
- ✅ Model loading and serving infrastructure
- MLOps Integration:
- ✅ MLflow experiment tracking setup
- ✅ Hydra configuration management
- Development Tools:
- ✅ Jupyter notebooks for exploration and experimentation
- ✅ Comprehensive evaluation notebooks
- Additional Collaborative Filtering Models:
- BPR (Bayesian Personalized Ranking)
- LightFM hybrid models
- Neural collaborative filtering
- Advanced Hybrid Approaches:
- Weighted blending of multiple models
- LightGBM-based hybrid models
- Ensemble methods
- Real-time API Features:
- Redis caching implementation
- User rating submission endpoints
- Movie detail endpoints
- Advanced Evaluation:
- Cross-validation pipelines
- A/B testing framework
- Online evaluation metrics
- Production Features:
- Docker containerization
- Celery task queue
- Database integration
- Monitoring and logging
src/
├── data/ # ✅ Data loaders and preprocessing
│ ├── loaders.py # MovieDataLoader class
│ └── id_mapping.py # ID mapping utilities
├── features/ # ✅ Feature extraction
│ ├── tfidf.py # TF-IDF feature extraction
│ └── text_embeddings.py # Text embedding generation
├── cf/ # ✅ Collaborative filtering models
│ └── train_als.py # ALS model training
├── hybrid/ # 🔄 Hybrid recommendation approaches
├── eval/ # ✅ Evaluation metrics and validation
│ └── metrics.py # Regression and ranking metrics
├── serve/ # ✅ FastAPI application structure
│ ├── app.py # Main API application
│ └── schemas.py # Pydantic schemas
└── pipelines/ # 🔄 Training and inference pipelines
- Python 3.9+
- 8GB+ RAM recommended
- GPU support (optional, for neural models)
# Clone the repository
git clone <your-repo-url>
cd MovieRecommenderSystem
# Create conda environment
conda env create -f environment.yml
conda activate movie-recommender
# Install in development mode
pip install -e .# Clone the repository
git clone <your-repo-url>
cd MovieRecommenderSystem
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtPlace your movie data files in the data/raw/ directory:
movies.csv- Movie metadata (id, title, genres, description)ratings.csv- User ratings (user_id, movie_id, rating, timestamp)genome-scores.csv- Movie-tag relevance scores (optional)genome-tags.csv- Tag definitions (optional)
# Extract text features
python src/features/text_embeddings.py --config conf/config.yaml
# Generate TF-IDF features
python src/features/tfidf.py --config conf/config.yaml# Train ALS collaborative filtering model
python src/cf/train_als.py --config conf/config.yaml# Start the FastAPI server
uvicorn src.serve.app:app --host 0.0.0.0 --port 8000 --reload# Example API call
curl -X POST "http://localhost:8000/recommend" \
-H "Content-Type: application/json" \
-d '{"user_id": 1, "n_recommendations": 10}'from src.cf.train_als import ALSTrainer
from omegaconf import OmegaConf
# Load configuration
config = OmegaConf.load('conf/config.yaml')
# Train ALS model
trainer = ALSTrainer(config)
ratings_matrix, movies_df, id_mapper = trainer.load_data()
als_model = trainer.train(ratings_matrix)from src.features.tfidf import TFIDFExtractor
from src.features.text_embeddings import TextEmbeddingExtractor
# Extract TF-IDF features
tfidf_extractor = TFIDFExtractor(config)
tfidf_matrix = tfidf_extractor.fit_transform(movies_df)
# Extract text embeddings
embedding_extractor = TextEmbeddingExtractor(config)
embeddings = embedding_extractor.fit_transform(movies_df)The system uses Hydra for configuration management. Key configuration files:
conf/config.yaml- Main configurationconf/models.yaml- Model hyperparametersconf/paths.yaml- File paths
Example configuration override:
python src/cf/train_als.py \
--config conf/config.yaml \
models.collaborative_filtering.als.factors=200 \
models.collaborative_filtering.als.regularization=0.05- Regression Metrics: RMSE, MAE
- Ranking Metrics: Precision@K, Recall@K, NDCG@K, Hit Rate@K
from src.eval.metrics import TopKMetrics, RegressionMetrics
# For rating prediction evaluation
reg_metrics = RegressionMetrics(y_true)
rmse, mae = reg_metrics.predict(y_pred)
# For top-K recommendation evaluation
topk_metrics = TopKMetrics(train, test)
# Use with recommendation DataFrameThe project includes comprehensive notebooks for exploration and experimentation:
notebooks/01_eda.ipynb- Exploratory data analysis of MovieLens datasetnotebooks/02_simple_baselines.ipynb- Implementation of baseline recommendation methodsnotebooks/03_svd.ipynb- SVD model training and evaluation with Surprise library
# Install pre-commit hooks
pre-commit install
# Run all hooks
pre-commit run --all-files
# Format code
black src/
isort src/# Run mypy
mypy src/
# Run with strict mode
mypy --strict src/Once the server is running, visit:
- Interactive API docs: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
- OpenAPI schema: http://localhost:8000/openapi.json
POST /recommend- Get personalized recommendationsGET /health- Health check endpoint
- MLflow: http://localhost:5000 (experiment tracking)
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- MovieLens for the dataset
- Implicit for ALS implementation
- Surprise for SVD and other algorithms
- FastAPI for the web framework
- MLflow for ML lifecycle management
For questions and support:
- Create an issue on GitHub
- Check the documentation
- Review the example notebooks in the
notebooks/directory