An intelligent movie discovery platform that provides personalized movie suggestions. This repository contains the complete data processing and model training pipeline, which generates the files needed for the final Streamlit web application.
Visit here: https://moviematch11.streamlit.app/
MovieMatch is a content-based recommendation system that predicts user preferences and suggests movies with similar themes, genres, or contributors.
The recommendation engine is built on cosine similarity applied to a carefully engineered set of metadata, ensuring contextually relevant and accurate movie matches.
The system follows a classic NLP-based pipeline:
-
Feature Engineering
- Data from TMDB 5000 dataset is cleaned and merged.
- Metadata and synopsis are combined into a single “tag” for each movie.
-
Text Vectorization
- Tags are normalized (lowercasing + stemming).
- Converted to numerical vectors using CountVectorizer.
-
Similarity Modeling
- A cosine similarity matrix computes relationships between all movies.
- The model is saved as .pkl files for use in the Streamlit app.
- 🧠 Personalized Recommendations – Get instant, relevant movie suggestions.
- 🔥 Top Picks Section – Discover trending and popular films.
- 🔎 Search Functionality – Find movies by title.
- ℹ Detailed Movie Pages – Access poster, cast, overview, and more.
- 📄 Paginated Results – Browse large lists of films easily.
- 🔒 Secure API Handling – TMDB API key is stored in secrets.toml.
- 🐍 Python
- 📓 Jupyter Notebook – for data exploration & model building
- 🐼 Pandas – data manipulation
- 🤖 Scikit-learn – CountVectorizer, cosine_similarity
- 📚 NLTK – text preprocessing
- 🎈 Streamlit – web application
- 🌐 Requests – TMDB API calls
This project has two main parts: generating the model and running the app.
- Place the dataset files (tmdb_5000_movies.csv, tmdb_5000_credits.csv) in the root folder.
- Run the Jupyter Notebook Movie Recommendation System.ipynb.
- This will generate the .pkl files (saved in your project directory).
-
API Key Setup
- Create a folder named .streamlit in the project root.
- Inside, create secrets.toml file:
toml TMDB_API_KEY = "your_actual_api_key_here"
-
Run the App
streamlit run app.py