This project investigates the key factors driving the box-office success of films. Using a dataset of 3,201 films from 2010 and earlier, the analysis explores how release dates, genres, IMDb and Rotten Tomatoes ratings, and production budgets influence financial performance. Through data cleaning, statistical analysis, and visualizations, the project uncovers actionable insights into movie industry trends.
- Identify seasonal trends in movie releases by genre.
- Examine the relationship between movie ratings and box-office success.
- Analyze the impact of production budgets on financial outcomes.
The dataset is sourced from publicly available film industry databases and includes:
- Movie titles
- Genres
- US and worldwide gross revenues
- IMDb and Rotten Tomatoes ratings
- Production budgets
- Release dates
- Additional attributes like MPAA ratings and distributors
The analysis was performed using Python with the following libraries:
- Pandas: For data manipulation and cleaning.
- Matplotlib and Seaborn: For generating visualizations.
The project is divided into three analytical sections:
- Release Date Analysis: Exploring how genres are distributed across release months.
- Ratings vs. Success: Investigating correlations between ratings and box-office revenues.
- Budget vs. Success: Assessing the relationship between production budgets and financial performance.
Data cleaning involved handling missing values (replaced with means or "Unknown") and removing columns with excessive missing data. Visualizations include count plots, line plots, scatter plots with regression lines, and bar plots.
- Release Date Insights: Action and Adventure films dominate summer releases, while family-oriented and holiday-themed movies peak in December, aligning with higher box-office revenues during these periods.
- Ratings Impact: Films with moderate IMDb ratings (6–7) and Rotten Tomatoes scores (70–80%) tend to achieve higher revenues, likely due to broader audience appeal.
- Budget Correlation: Higher production budgets (especially over $100M) strongly correlate with higher box-office revenues, though mid-budget films ($10M–$50M) also perform well, suggesting other factors like marketing and quality play a role.
Action and Adventure films dominate summer releases, while family-oriented and holiday-themed movies peak in December. This alignment with seasonal demand contributes to higher box-office revenues during these periods.
Figure 1: Distribution of film releases by month and genre.
Figure 2: Average box office revenue for films released in each month.
Films with moderate IMDb ratings (6–7) and Rotten Tomatoes scores (70–80%) tend to achieve higher revenues, likely due to broader audience appeal. Very high or very low ratings do not consistently correlate with high earnings.
Figure 3: Relationship between IMDb ratings and box office success.
Figure 4: Correlation between Rotten Tomatoes ratings and financial performance.
Higher production budgets, especially over $100M, strongly correlate with higher box-office revenues. However, mid-budget films ($10M–$50M) also perform well, indicating that other factors like marketing and quality play significant roles.
Figure 5: Comparison of average revenue across different production budget categories.
Figure 6: Positive correlation between production budgets and box office revenues.
The analysis is implemented in the IPython notebook MovieBoxOfficeAnalysis.ipynb, available in this repository. The notebook includes:
- Data loading and cleaning.
- Code for all visualizations.
- Analysis of release dates, ratings, and budgets.
- Clone this repository to your local machine.
- Install the required Python libraries:
pip install pandas matplotlib seaborn
- Open
Untitled1.ipynbin Jupyter Notebook or Google Colab. - Run all cells to reproduce the analysis and visualizations.
Rezaul Hoque
This project is licensed under the MIT License - see the LICENSE file for details.