This project aims to empirically investigate the relationship between the number of photos attached on Yelp and the restaurant’s average rating, as well as the extent to which the type of photo (food & drink, environment, menu) moderates this relationship. The purpose of this study is to provide practical insights for managers and restaurant owners on how photos in reviews may enhance ratings, certainty and increase potential visitors. To address this, a multiple linear regression model is applied with data obtained from Yelp open dataset.
The influence of online reviews and electronic word of mouth (eWOM) on consumer perceptions and decision-making is evident in today’s digital society (Wang et al., 2021). As such, this phenomenon has been widely researched in marketing. The influence of reviews is especially pronounced in the case of “experience goods". For example, restaurant services are perceived as riskier to evaluate before purchase (Weisskopf, 2018). Previous findings suggest that by reading restaurant reviews, customers are able to reduce this perceived risk (Parikh et al., 2014). Further supporting this claim, Luca (2016, p.3) demonstrates that “a one-star increase in Yelp rating leads to a 5–9 percent increase in revenue”.
Photos have become a key component of online reviews as they add a visual element that allows users to better communicate their opinions (Li et al., 2021). However, while text-based reviews have been extensively studied, the association between the number of photos included in reviews and a restaurant’s average rating remains underexplored. Additionally, current literature is limited with regard to how different types of photos may impact this relationship. To address this gap in the literature, the following research question has been formulated:
"How does the total number of photos included in Yelp reviews influence a restaurant’s average rating, and to what extent does the type of photo (food, environment, menu) moderate this relationship?"
From a managerial and business perspective, the stakeholders, namely restaurant owners and managers, may gain a deeper understanding from the findings of this project on how to potentially raise ratings per restaurant, reduce average rating volatility and increase visibility and engagement. Luca (2016) concluded that the average rating of a restaurant guides customers to make informed decisions. Ultimately, this may lead to an improved reputation and subsequently increased customer demand.
Furthermore, managers can enhance their marketing strategy by strategically encouraging customers to publish reviews with photos on platforms such as "Yelp" or "Tripadvisor" as it reduces customer uncertainty. To encourage reviews that include photos, managers may add QR codes on the menu or receipts that could incentivize customers with rewards for the uploaded photos. Additional innovative approaches are table toppers and Wi-Fi login prompts inviting the customers to share a photo of the food or environment. By prompting customers to share additional photos, managers may be more inclined to detect recurring issues, creating a feedback loop.
Firstly, to collect the required data, the datasets “dataset_business" and "dataset_photos" have been downloaded from Yelp Open Dataset. Accordingly, we have built an R pipeline (tidyr and dyplr) that merged and cleaned the data.
The steps executed after were:
- Calculated the amount of photos that exist for “food & drink”, “environment” and “menu” for each business_id.
- Transformed the data in order to separate columns for each category.
- Handled missing businesses by assigning them to value 0.
- Inserted a new column “total_photos” per business.
- Filtered for restaurants.
- Performed a safety check.
As follows, the “final_dataset.csv” with 29.374 observations and 10 variables was obtained. The study will utilize and analyze the subsequent variables:
| Variable | Description | Data Class |
|---|---|---|
| business_id | The unique Yelp ID of the business | Character |
| name | The business name as shown on Yelp | Character |
| attributes | The map on Yelp of a restaurant’s amenities, services, and policies | List |
| categories | The list of Yelp categories of cuisines for the business | Character |
| stars | The average Yelp scale star rating (1–5) | Numeric |
| review_count | The total number of Yelp reviews | Numeric |
| environment | The number of environment photos | Numeric |
| food & drink | The number of food & drink photos | Numeric |
| menu | The number of menu photos | Numeric |
| total_photos | The total number of photos for the business | Numeric |
The project combines a series of linear regression models that were estimated in R. In order to create a baseline relationship between the average star rating and the total number of photos, the Simple Linear Regression (SLR) is firstly executed, which provides an initial measure of association. Subsequently, a Multiple Linear Regression (MLR) is estimated as the main model of the analysis. The MLR includes multiple predictors and an interaction (moderation) term, allowing for the assessment of how different photo types contribute to ratings and whether the effect of photo quantity varies depending on photo type. This approach provides a more detailed understanding of the combined and conditional effects of the photo content uploaded by customers on restaurant ratings.
The main model is specified as:
Y=β0+β1(Photos)+β2(Photo Category)+β3(Photos×Photo Category)+ε
The average restaurant rating is denoted by Y, while the total number of photos is represented by Photos. The dominant type of photo is indicated by Photo Category (food and drink, menu, or environment), and the interaction term determines whether the effect of photo quantity varies across categories.
All models were estimated using the lm() function in R, and results were visualized to support interpretation of both main and interaction effects.
Adding photos is associated with higher star ratings for restaurants dominated by food & drink photos, has a slightly smaller positive effect for environment photos, and shows no significant effect for menu photos. These findings highlight the value of user-generated visual content in online reviews.
Rated-by-the-Lens/
├── src/
│ ├── 1-raw-data/
│ │ ├── download.R
│ │ └── makefile
│ ├── 2-data-preparation/
│ │ ├── clean.R
│ │ └── makefile
│ └── 3-analysis/
│ ├── analysis.R
│ ├── visualize.R
│ └── makefile
├── gen/ # (ignored in Git) generated data, outputs, and temp files
│ ├── data/
│ │ ├── business.csv
│ │ └── photos.csv
│ ├── output/
│ │ ├── main_effect.png
│ │ ├── model_categories.png
│ │ ├── model_central_moderation.png
│ │ ├── photo_category_plot.png
│ │ └── stars_total_photos.png
│ └── temp/
│ └── final_dataset
├── reporting/
│ ├── Final_Paper.RMD
│ └── Data_Exploration_Report.RMD
├── .gitignore
├── README.md
├── makefile
This project relies on several R packages for data manipulation, cleaning, and visualization. Please make sure the following libraries are installed before running the scripts:
-
dplyr – used for data wrangling and manipulation
-
tidyr – used for tidying and reshaping datasets
-
ggplot2 – used for creating visualizations and plots
-
grid – used for saving model summaries as PNGs
To install these packages (if not already installed), run the following in R:
install.packages(c("dplyr", "tidyr", "ggplot2", "grid"))
Once installed, the libraries can be loaded with:
library(dplyr)
library(tidyr)
library(ggplot2)
library(grid)
Before proceeding, please install the following:
- R - Install here
- If you are a Windows user, also install Make - Download here
Instructions to run the code:
- Fork the repository on Github.
- Open your command-line or terminal.
- Clone the forked repository to your computer:
git clone https://github.com/course-dprep/Rated-by-the-Lens.git- Set working directory to 'Rated-by-the-Lens':
cd Rated-by-the-Lens- Run the automated workflow with the following command:
makeAfter executing 'make', it will generate the following:
- gen/data/ - downloaded data ('photos.csv and business.csv')
- gen/temp/ - intermediate file ('final_dataset.csv')
- gen/output/ - analysis outputs ('stars_total_photos.png', 'photo_category_plot.png', 'model _central_moderation.png', 'model_categories.png' and 'main_effect.png')
- Clean the project:
make cleanThis resets the project folder to its original state.
This project is set up as part of the Master's course Data Preparation & Workflow Management at the Department of Marketing, Tilburg University, the Netherlands.
The project is implemented by team 9 members: Amelia Syta, Carla Alboquers Coll, Ezgi Torbala, Shery Khalil, and Sophie Wolf.
Li, C., Kwok, L., Xie, K. L., Liu, J., & Ye, Q. (2021). Let Photos Speak: The Effect of User-Generated Visual Content on Hotel Review Helpfulness. Journal of Hospitality & Tourism Research, 47(4), 109634802110191. https://doi.org/10.1177/10963480211019113
Luca, M. (2016). Reviews, Reputation, and Revenue: The Case of Yelp.com. Harvard Business School NOM Unit Working Paper, 12(016). https://doi.org/10.2139/ssrn.1928601
Parikh, A., Behnke, C., Vorvoreanu, M., Almanza, B., & Nelson, D. (2014). Motives for reading and articulating user-generated restaurant reviews on Yelp.com. Journal of Hospitality and Tourism Technology, 5(2), 160–176. https://doi.org/10.1108/jhtt-04-2013-0011
Wang, Y., Kim, J., & Kim, J. (2021). The financial impact of online customer reviews in the restaurant industry: A moderating effect of brand equity. International Journal of Hospitality Management, 95, 102895. https://doi.org/10.1016/j.ijhm.2021.102895
Weisskopf, D. J.-P. (2018, September 30). Online Customer Reviews: Their Impact on Restaurants. Hospitalityinsights.ehl.edu. https://hospitalityinsights.ehl.edu/online-customer-reviews-restaurants