Movie Recommender System: Overview

This project implements a Movie Recommender System using both Content-Based Filtering and Collaborative Filtering techniques. The system uses multiple datasets from TMDB and MovieLens to provide personalized movie recommendations, analyze genre-based profitability. The project integrates Python, Pandas, NumPy, Matplotlib and Scikit-Learn libraries.

Primary goals:

Implement Content Based Filtering to recommend movies similar to a given one and Collaborative Filtering which is based on user preferences. In addition, I preproccesed the data and analyzed movie profitability and the return on investment (ROI) per genre. Historical trends in movie profits can be seen with the animated bar charts.

Recommendation Systems

Content-Based Filtering Content Based Filtering is an aproach in which recommendations are generated by analyzing the items and sugesting those that are similar to items the user has previously interacted with. In movie recommender systems, this involves metadata such as genres, cast, crew, keywords or plot descriptions. These features are represented as high dimensional vectors, constructed using NLP tehniques such as TF-IDF (Time Frequency - Inverse Document Frequency). The degree of similarity between the 2 movies is measured with the cosine similarity (or with the Euclidean distance) and the system ranks movies based on their proximity to the feature space.

Collaborative Filtering Collaborative Filtering has the general idea that users with the same past preferences will have the same future preferences, and the recommendations come from the interactions of the community. A user-item rating matrix is constructed, which is typically sparse due to the vast number of items relative to the numbers of ratings per user. Similarity can be computed either between users or between items using statistical tehniques such as cosine similarity. More advanced implementations involves dimension reduction which can be achieved with the Singular Value Decomposition (SVD). The main disadvantage is the problem with the new users or items.

Dataset Description

For this project i used three main datasets: Movies Metadata (movies_metadata.csv), Credits (credits.csv), Keywords (keywords.csv) and Ratings (ratings_small.csv)

Data Preprocessing

Parsing Columns
JSON-like columns such as genres, production_companies, and spoken_languages are converted to Python lists using the parseColumn function. Missing values and any invalid entries are replaced with empty lists.

Poster URLs
All poster paths are converted to full URLs. If an image URL is invalid or missing, it is replaced with a standard image (question-mark.jpg).

Profit and ROI
Profit is calculated as revenue - budget.
ROI is calculated as (revenue - budget) / budget * 100.
Only movies with valid revenue and budget are included in further analysis.

Genre Filtering
Movies are grouped by genre for profit and ROI analysis. Each genre is separately sorted by year for the animated visualization.

Data Analysis

Top Movies by Profit
The system identifies movies with the highest profit and displays their title, budget, profit, and ROI.

Genre-Based Profit Analysis
A bar plot visualizes total profit per genre. ROI per genre is also visualized to understand efficiency in terms of investment return.

Animated Profit Trends
Animated bar charts show profit trends per genre over years. The animation uses matplotlib.animation.FuncAnimation.

Author
Karina Antoniu – UNSTPB, Facultatea de Automatica si Calculatoare Bucuresti
26.07.2024

Datasets were downloaded from Kaggle: https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset/data

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
keywords.csv		keywords.csv
links.csv		links.csv
links_small.csv		links_small.csv
movies_metadata.csv		movies_metadata.csv
question-mark.jpg		question-mark.jpg
ratings_small.csv		ratings_small.csv
recom_sys.py		recom_sys.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Recommender System: Overview

Primary goals:

Recommendation Systems

Dataset Description

Data Preprocessing

Data Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Movie Recommender System: Overview

Primary goals:

Recommendation Systems

Dataset Description

Data Preprocessing

Data Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages