Factors Impacting Global Happiness

MSU AI Bootcamp: Project #2

Team: Happy Tree Pandas

Members:

Description

Our project dives into the multifaceted factors influencing global happiness. Leveraging World Happiness Report data from Kaggle spanning 2005 to 2023, we've crafted code to uncover key drivers of happiness across countries.

Getting Started

To explore our findings and analysis, follow these steps:

Installation: Clone this repository to your local machine.
Setup: Open the cloned repository in Jupyter Notebook or Visual Studio Code to access the project files.

Dependencies / Required Packages

Python 3.10
Jupyter Notebook
Pandas
Pandas Plotting
NumPy
Matplotlib
Seaborn
Prophet
Time-Series Analysis
Scikit-learn
Math
MLxtend

Goals / Questions to be addressed

We used data from the World Happiness Report to predict future global happiness.
Based on our findings, we selected other variables to run additional models to study correlations further.
We used those selected variables to predict how much of an impact those variables have on the outcome.
What would life be like with and without the economic crash of 2008 and 2009 and the COVID-19 pandemic of 2020 and 2021?

Data Overview

The World Happiness Report (WHR) is a partnership of:

Gallup
Oxford Wellbeing Research Centre
United Nations Sustainable Development Solutions Network
WHR’s Editorial Board

To obtain the happiness rankings of each country, the WHR provided survey responses from the life evaluations of the Gallup World Poll (GWP). Happiness rankings are based on the answers to the Cantril Ladder question:

"Please imagine a ladder, with steps numbered from 0 at the bottom to 10 at the top. The top of the ladder represents the best possible life for you and the bottom of the ladder represents the worst possible life for you. On which step of the ladder would you say you personally feel you stand at this time?”

The happiness rankings according to the GWP will be referred to as either "Life Ladder" or "Happiness Score" throughout this project.

Dataset Explanations

Life Ladder (Happiness Score) : Happiness ranking on a scale of 0 (saddest) to 10 (happiest).
Country Name
Regional Indicator : Region in which the country is located.
Log GDP Per Capita : The wealth of individuals in a country.
Social Support : A value that indicates how many people have family and friends that they can rely on in times of trouble.
Healthy Life Expectancy At Birth : Average life expectancy of a country. Based on data from the World Health Organization's (WHO) Global Health Observatory data repository.
Freedom To Make Life Choices : National average of the satisfaction with individual freedom to make life choices.
Generosity : Measure of how likely people are to donate to charity.
Perceptions Of Corruption : Measure of national corruption. The GWP asked whether corruption was widespread throughout both government and business, on an individual level.
Positive Affect : Average of measures for laughter, enjoyment and doing interesting things.
Negative Affect : Average of measures for worry, sadness and anger.
Confidence In National Government : Institutional trust

Data Splitting For Analysis

We split our data in a few different ways to make our comparisons.

We ran our analysis based on the Region each country was assigned to. This is the 'Regional Indicator' column.

Regions, according to the WHR:

Central and Eastern Europe
Commonwealth of Independent States
East Asia
Latin America and Caribbean
Middle East and North Africa
North America and ANZ (USA, Canada, Australia, New Zealand)
Southeast Asia
Sub-Saharan Africa
Western Europe

We also considered historical points in time (in the range of our data set, 2005-2023) that may have affected global happiness scores. We decided on two major incidents:

The global economic recession that impacted our world economy. This took place during 2008 and 2009.
The global COVID-19 pandemic of 2020 and 2021.

Approach taken to achieve our goals

Dataset Selection and Modification
- Obtain a complete set of data (2005-2023) in CSV format.
- Ensure dataset has at least 500 records (1000 if developing a decision tree/random forest model).
- Import selected CSV file into our project repository.
Cleaning and Refining
- Analyze columns to check for null values.
- Identify and clean columns to ensure consistency.
- Drop unnecessary columns.
- Cleaned data is exported as CSV files for the machine learning model.
- Look for correlations in our data.
Data Model Implementation
- Initialize, train, and evaluate a model (or load a pretrained model).
- Test model accuracy and tweak until the model demonstrates meaningful predictive power.
- Achieve at least 75% classification accuracy or 0.80 R^2 score.
  - We were able to achieve a 0.9316 R^2 score.
Data Model Optimization
- The model optimization and evaluation process shows iterative changes made to the model.
- The resulting changes in model performance is documented in either a CSV/Excel table or in the Python script itself.
- Overall model performance is printed or displayed at the end of the script.
Draw Conclusions
- Make observations based on analysis.

Data Clean-Up

Our process:

Generated a list of data types
Checked for null values
Handled missing values by filling with the mean values of the scores

Analysis

Corruption does not necessarily correlate with being happy. Countries will high levels of corruption can still have high levels of happiness.
Central and Eastern Europe and Latin America & Carribiean regions had high corruption scores and high happiness.
Life Expectancy and per capita is highly correlated to happiness.

Conclusions

GDP is one of the largest driving factors affecting happiness.
Countries will high levels of corruption can still have high levels of happiness.
- For example, Saudi Arabia, has an imputed corruption score of 1 (the highest possible score), yet their happiness scores are not directly affected.
- An additional note on corruption: corruption is highly subjective and varies with different standards throughout the world.
Removing the financial crisis and COVID-19 pandemic of did not have a substantial impact on the projected future of happiness.
Over the timescale of the study, the dataset contained additional fields and was more complete, with additional countries.
Final accuracy achieved:
- Mean Squared Error: 0.0169
- R^2 Score: 0.9316

Problems encountered

The WHR changed significantly over time
Besides GDP per capita & life expectancy, open ended questions
Although we tried to identify major global incidents within the dataset (COVID and a world financial crisis), the data is likely affected by other events or have causal links beyond those we identified (which could be regional, social-political, environmental, and / or cultural).
Removing years for global and sociopolitical events
Generating a world map (GeoPandas)

Future Considerations

Consider the complexity of other world events
Experiment with changing regional indicators
Look at additional datasets about happiness

Repository Files

File	Description
DATA PREPARATION :
WHR 2005 to 2023.csv	CSV file containing WHR data from 2005 - 2023
WHR_2005to2023_processed.csv	Additonal CSV used for Seaborn

MODELING AND PREDICTIONS :
Correlation_to_HappinessLadder.ipynb	All categories related to happiness Seaborn Regplot
High_Correlation_Attributes_Analysis_(Regional).ipynb	Most significant correlations to happines using Seaborn Pairplot
Linear_Regression_Prophet.ipynb	Linear Regression, Prophet, KNN, RandomForest, and Seaborn
Regional_Correlation.ipynb	Regional Analysis
.gitignore	Git ignore

PRESENTATION :
Factors_Impacting_Global_Happiness.pdf	Powerpoint used in presentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Factors Impacting Global Happiness

MSU AI Bootcamp: Project #2

Team: Happy Tree Pandas

Members:

Description

Getting Started

Dependencies / Required Packages

Goals / Questions to be addressed

Data Overview

The World Happiness Report (WHR) is a partnership of:

To obtain the happiness rankings of each country, the WHR provided survey responses from the life evaluations of the Gallup World Poll (GWP). Happiness rankings are based on the answers to the Cantril Ladder question:

The happiness rankings according to the GWP will be referred to as either "Life Ladder" or "Happiness Score" throughout this project.

Dataset Explanations

Data Splitting For Analysis

We split our data in a few different ways to make our comparisons.

We ran our analysis based on the Region each country was assigned to. This is the 'Regional Indicator' column.

Regions, according to the WHR:

We also considered historical points in time (in the range of our data set, 2005-2023) that may have affected global happiness scores. We decided on two major incidents:

Approach taken to achieve our goals

Data Clean-Up

Our process:

Analysis

Conclusions

Problems encountered

Future Considerations

Repository Files

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
Resources		Resources
.gitignore		.gitignore
Correlation_to_HappinessLadder.ipynb		Correlation_to_HappinessLadder.ipynb
Factors_Impacting_Global_Happiness.pdf		Factors_Impacting_Global_Happiness.pdf
High_Correlation_Attributes_Analysis_(Regional).ipynb		High_Correlation_Attributes_Analysis_(Regional).ipynb
Linear_Regression_Prophet.ipynb		Linear_Regression_Prophet.ipynb
README.md		README.md
Regional_Correlation.ipynb		Regional_Correlation.ipynb

Folders and files

Latest commit

History

Repository files navigation

Factors Impacting Global Happiness

MSU AI Bootcamp: Project #2

Team: Happy Tree Pandas

Members:

Description

Getting Started

Dependencies / Required Packages

Goals / Questions to be addressed

Data Overview

The World Happiness Report (WHR) is a partnership of:

To obtain the happiness rankings of each country, the WHR provided survey responses from the life evaluations of the Gallup World Poll (GWP). Happiness rankings are based on the answers to the Cantril Ladder question:

The happiness rankings according to the GWP will be referred to as either "Life Ladder" or "Happiness Score" throughout this project.

Dataset Explanations

Data Splitting For Analysis

We split our data in a few different ways to make our comparisons.

We ran our analysis based on the Region each country was assigned to. This is the 'Regional Indicator' column.

Regions, according to the WHR:

We also considered historical points in time (in the range of our data set, 2005-2023) that may have affected global happiness scores. We decided on two major incidents:

Approach taken to achieve our goals

Data Clean-Up

Our process:

Analysis

Conclusions

Problems encountered

Future Considerations

Repository Files

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages