This is a Content-Based Movie Recommendation System that suggests similar movies using only movie metadata — without needing user history or ratings.
This system helps users discover similar content by analyzing:
- Genres
- Keywords
- Cast
- Director
- Overview (Plot)
Example: Enter
Inception→ get back 5 most similar movies based on metadata.
- Python
- Pandas, NumPy
- Scikit-learn (CountVectorizer, Cosine Similarity)
- NLTK (Stemming)
- Jupyter Notebook
- TMDb 5000 Movie Dataset
- Combines key text-based features into a single
tagscolumn - Applies stemming for noise reduction
- Vectorizes text using CountVectorizer
- Calculates similarity using cosine similarity
- Provides top-5 movie recommendations based on similarity scores
- Preprocess movie metadata
- Merge features (cast, crew, genres, keywords)
- Stem text to normalize vocabulary
- Convert tags into vectors using
CountVectorizer - Compute similarity using
cosine_similarity - Recommend top similar movies
- Clone this repo
- Open
recommendation_system.ipynbin Jupyter - Run all cells
- Call
recommend("Movie Name")to get suggestions!
TMDb 5000 Movie Dataset from Kaggle
You must download and place it in the same directory as the notebook
This project is licensed under the MIT License — feel free to use or modify it for educational and learning purposes.
Shreya Rao
Data Science Undergraduate | Python & ML Enthusiast
GitHub Profile