Leisure Time

1. Summary

Aren’t you tired of choosing a random movie or book to enjoy?

The objective of this project is to give you daily recommendations on movies and books, depending on the specific day in question. We have celebrities’ birthdays, international days and anniversaries of certain events, such as famous battles.

This project that originates Leisure Time – Movie&Book Recommendation System is based on an NLP model that was specifically searched for the purpose of connecting one description into another one.

2. Python files

Books.ipynb
Days.ipynb
model.ipynb
Movies_IMDB.ipynb
Movies_TMDP_API.ipynb
appimdb.py
appimdb2.py
apptmdb.py
apptmdb2.py

3. Datasets

"01 Queries" folder
df_birthdays_movies.csv
df_birthdays_books.csv
days.csv
matches'%d%m%Y'_TMDB.csv
matches'%d%m%Y'_IMDB.csv
goodreads.csv (downloadable - refer to chapter 5)
best_books.csv
TMDB_movies_final.csv
imdb_movie_fetch.csv

4. Interface

5. Books

The python book used for dealing with the books dataframe was Books.ipynb. The books dataframe used was from Kaggle, from the following source:

goodreads.csv

The dataset was cleaned to Latin and English titles, using the langid library. This dataframe was also reduced to the books with a certain minimum rating and votes. In this case, the final dataframe of books has only books with at least 3.5 rating and 1000 votes - best_books.csv. In the end, we get the authors' birthdays by webscraping Wikipedia to add to our Days dataframe - df_birthdays_books.csv.

6. Movies

The focus of this project was the movies, because nowadays we give more focus into television. So there were 2 approaches to get movie data:

From TMBD API – using the API from: https://www.themoviedb.org/
From IMDB website – using Web Scraping, from IMDB advanced search system Each of the processes takes more than 12 hours to run. Web Scraping can be time-consuming, especially when dealing with big data.

6.1. TMDB API

To use TMDB API in Movies_TMDB_API.ipynb the following steps were made to get the correct bearer and API key: https://developer.themoviedb.org/reference/intro/getting-started/*. To get more data, such as actors, budgets, revenues, imdb ids and streams the following source was used: https://github.com/celiao/tmdbsimple/blob/master/README.md.

base_url = "https://api.themoviedb.org/3/discover/movie"
headers = {
"accept": "application/json",
"Authorization": "Bearer YOUR_BEARER" ######### ------------------------- FROM TMDB API
}

tmdb.API_KEY = 'YOUR_API_KEY' ######## ------------------- select from your TMDB API KEY
tmdb.REQUESTS_SESSION = requests.Session()

6.2. IMDB Web Scrapping

In this case, I looped through all the genres with a minimum rating of 5.0 and 3000 votes, using web scraping to get:

Pages URLs
Movies URLs
Web scraping of all the movies URLs to get a dataframe imdb_movie_fetch.csv with:
- URL
- Movie title
- Movie image
- IMDb Rating
- Number of votes
- Movie description
- Movie genres
- Published Date
- Content Rating
- Actors, writers, and directors
- Movie Popularity
Web scraping Wikipedia to get the actors' birthdates - df_birthdays_movies.csv.

7. Days

To get the dataframe for the days, in Days.ipynb, research was done to obtain first the international days, and then the anniversaries of certain events, from several sources. Lastly, I gathered up also the anniversaries of authors and actors that were in the movies and books dataframes.

Sources:

https://date.nager.at/api/ - Web scrapping
https://www.un.org/en/observances/list-days-weeks/ - Web scrapping
https://www.unesco.org/en/days/ - Web scrapping
https://en.wikipedia.org/wiki/ - Web scrapping for days’ descriptions and images
http://w.wiki/6Zx/ - Manual download of data queries
Authors and actors birthdays - got from movies and books dataframes

In the end, all the types of days are in the dataframe days.csv.

8. Model

In model.ipynb, the days, books and movies datasets are loaded and put through the chosen model universal-sentence-encoder.

model = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")

Each type of day can be matched up to 3 movies and 3 books, based on the top similarity ratios calculated by the model. Two matches datasets for today’s events are created, one using TMDB API and the other IMDB web scrapping.

9. Flask App

The flask app does the reading of the matches in that day and display them along the web interface, including also snack and drink recommendations for the movies. There are 4 documents that can be used to run the Flask App:

appimdb.py
apptmdb.py
appimdb2.py
apptmdb2.py

Only the 1st versions appimdb.py and apptmdb.py include the snacks and drinks recommendations for the movies, using Open AI API. So to use them properly the Open AI key needs to be inside those files in:

openai.api_key = "YOUR_API_KEY" ##### -------- input your OpenAI API key

The 2nd versions can be run without any API key input.

The result Leisure Time from the .html code that is in the “templates” folder gives an overview for one day of each type of days – one international day, one celebrity birthday, and one event anniversary.

10. How-to-Run Guide

If you want, you can skip to Step 6 and use the already created dataframes for days, movies, books and respective matches (up2date mid Jul23):
- days.csv
- best_books.csv
- TMDB_movies_final.csv
- imdb_movie_fetch.csv
Run fully Books.ipynb (1st download goodreads.csv as per chapter 5) to get best_books.csv and df_birthdays_books.csv.
Run Movies_TMDB_API.ipynb to get TMDB_movies_final.csv.
- This process takes several hours – be patient.
- Remember to input your Bearer and API Key in the respective code lines (check chapter 6.1)
Run fully Movies_IMDB.ipynb to get imdb_movie_fetch.csv and df_birthdays_movies.csv
- This process takes several hours – be patient.
Run Days.ipynb to get days.csv dataframe.
Run our model - model2.ipynb – to get the matches of today.
- This model takes roughly 2-3 hours to run for both TMDB and IMDB movies dfs.
Choose which flask app file .py you want to use:
- appimdb.py – to run with IMDB matched movies (Remember to input your Open AI key)
- apptmdb.py – to run with TMDB matched movies (Remember to input your Open AI key)
- appimdb2.py – to run with IMDB matched movies, without snacks&drinks recommendations for movies (best if you don’t want to use APIs)
- apptmdb2.py – to run with TMDB matched movies, without snacks&drinks recommendations for movies (best if you don’t want to use APIs)
In GitBash, or another command prompt software, go to the location of your forked repository and type: python “selected_app.py”. Should appear something like this:

Now, you just need to go to your internet browser and type the selected host, e.g. http://127.0.0.1:5000 from above.
Voilà.

11. Web Display

The flask app is also display in the following website, using appimdb3.py:

https://leisure-time-5b9ed2cf23e8.herokuapp.com/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Leisure Time

1. Summary

2. Python files

3. Datasets

4. Interface

5. Books

6. Movies

6.1. TMDB API

6.2. IMDB Web Scrapping

7. Days

8. Model

9. Flask App

10. How-to-Run Guide

11. Web Display

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
01 Queries		01 Queries
02 Readme		02 Readme
static/img		static/img
templates		templates
Books.ipynb		Books.ipynb
Days.ipynb		Days.ipynb
Movies_IMDB.ipynb		Movies_IMDB.ipynb
Movies_TMDB_API.ipynb		Movies_TMDB_API.ipynb
Procfile		Procfile
README.md		README.md
TMDB_movies_final.csv		TMDB_movies_final.csv
all_matches2_part1.csv		all_matches2_part1.csv
all_matches2_part2.csv		all_matches2_part2.csv
all_matches_part1.csv		all_matches_part1.csv
all_matches_part2.csv		all_matches_part2.csv
appimdb.py		appimdb.py
appimdb2.py		appimdb2.py
appimdb3.py		appimdb3.py
apptmdb.py		apptmdb.py
apptmdb2.py		apptmdb2.py
apptmdb3.py		apptmdb3.py
best_books.csv.zip		best_books.csv.zip
best_books_part1.csv		best_books_part1.csv
best_books_part2.csv		best_books_part2.csv
days.csv		days.csv
df_birthdays_books.csv		df_birthdays_books.csv
df_birthdays_movies.csv		df_birthdays_movies.csv
imdb_movie_fetch.csv		imdb_movie_fetch.csv
model2.ipynb		model2.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Leisure Time

1. Summary

2. Python files

3. Datasets

4. Interface

5. Books

6. Movies

6.1. TMDB API

6.2. IMDB Web Scrapping

7. Days

8. Model

9. Flask App

10. How-to-Run Guide

11. Web Display

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages