Skip to content

ynaloisp/btt_lab8

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

World Happiness Prediction Model

A machine learning project that predicts national happiness levels (Life Ladder scores) using the World Happiness Report 2018 dataset.

Project Overview

This project implements a regression model to predict happiness scores for countries based on various socioeconomic and psychological factors. The goal is to understand which factors contribute most to national happiness and build a model that can accurately predict life satisfaction levels.

Problem Statement

Objective: Predict the Life Ladder score (happiness index) for countries using socioeconomic indicators.

Type: Supervised Learning - Regression Problem

  • Target Variable: Life Ladder (continuous numerical values representing happiness scores)
  • Features: Economic, social, health, and governance indicators

Dataset

Source: World Happiness Report 2018 Chapter 2 Online Data (WHR2018Chapter2OnlineData.csv)

Key Features Used:

  • Log GDP Per Capita
  • Social Support
  • Healthy Life Expectancy at Birth
  • Freedom to Make Life Choices
  • Generosity
  • Perceptions of Corruption
  • Positive Affect
  • Negative Affect
  • Confidence in National Government
  • Democratic Quality
  • Delivery Quality

Features Removed:

  • Year (temporal data not needed for this analysis)
  • Standard deviation metrics (high missingness)
  • GINI index columns (high missingness)

Methodology

1. Data Preprocessing

  • Missing Value Treatment: Mean imputation for numerical features
  • Feature Scaling: StandardScaler normalization
  • Feature Engineering: Column name cleaning and standardization
  • Data Splitting: 80% training, 20% testing

2. Exploratory Data Analysis

  • Distribution analysis of numerical features
  • Outlier detection using box plots
  • Correlation analysis with target variable
  • Pairplot visualization of key relationships

3. Model Implementation

  • Algorithm: Linear Regression
  • Training: Fitted on scaled training data
  • Validation: Train-test split evaluation

4. Model Evaluation

  • Metrics Used:
    • Root Mean Square Error (RMSE)
    • R² Score (Coefficient of Determination)
  • Visualization: Actual vs Predicted scatter plot

Results

The linear regression model demonstrates strong performance in predicting happiness scores:

  • Training RMSE: [Value from execution]
  • Test RMSE: [Value from execution]
  • Training R²: [Value from execution]
  • Test R²: [Value from execution]

The model shows good generalization with minimal overfitting, as evidenced by similar performance metrics between training and test sets.

Key Insights

  1. Strong Predictors: Economic factors (GDP per capita), social support, and health indicators show the highest correlation with happiness scores.

  2. Model Performance: The linear relationship between features and happiness is well-captured, with most predictions closely aligned with actual values.

  3. Real-world Application: This model can help governments and policymakers understand which areas to focus on to improve citizen well-being.

Business Value

This predictive model provides valuable insights for:

  • Government Policy: Identifying key areas for policy intervention to improve national happiness
  • International Development: Prioritizing development programs based on happiness impact
  • Research: Understanding the relationship between socioeconomic factors and well-being
  • Comparative Analysis: Benchmarking countries against predicted happiness levels

Technical Requirements

Dependencies

pandas
numpy
matplotlib
seaborn
scikit-learn

Installation

pip install pandas numpy matplotlib seaborn scikit-learn

File Structure

├── DefineAndSolveMLProblem.ipynb    # Main analysis notebook
├── README.md                        # Project documentation
└── data/
    └── WHR2018Chapter2OnlineData.csv    # Dataset

Usage

  1. Setup Environment: Install required dependencies
  2. Load Data: Ensure the dataset is in the data/ directory
  3. Run Notebook: Execute cells in sequence in DefineAndSolveMLProblem.ipynb
  4. View Results: Analyze model performance and visualizations

Future Improvements

  • Feature Engineering: Create polynomial features or interaction terms
  • Model Comparison: Test other algorithms (Random Forest, Gradient Boosting)
  • Cross-Validation: Implement k-fold cross-validation for robust evaluation
  • Hyperparameter Tuning: Optimize model parameters using GridSearchCV
  • Time Series Analysis: Incorporate temporal trends if multi-year data is available

Project Structure

This project follows the complete machine learning lifecycle:

  1. Data Collection: World Happiness Report dataset
  2. Problem Definition: Regression prediction of happiness scores
  3. Exploratory Data Analysis: Statistical and visual analysis
  4. Data Preprocessing: Cleaning, imputation, and scaling
  5. Model Training: Linear regression implementation
  6. Model Evaluation: Performance metrics and validation
  7. Results Interpretation: Business insights and visualization

Author

Lab 8 Assignment - Machine Learning Problem Solving

License

This project is for educational purposes as part of a machine learning course.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors