Skip to content

krishna-moukthika/WorldLifeExpectancy-EDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Project Overview This project delved into the factors influencing global life expectancy over the past 15 years. By employing MySQL for data cleaning and exploration, I sought to uncover patterns and correlations between life expectancy and key variables such as GDP, BMI, and country development status.

Data Cleaning Process

  • Scrutinized the Dataset: Employed SELECT statements to uncover data characteristics, inconsistencies, and potential anomalies.
  • Eliminated Duplicates: Removed redundant records using GROUP BY, HAVING, and ROW_NUMBER() window functions to pinpoint and remove exact matches.
  • Imputed Missing Values: Filled data gaps leveraging UPDATE statements and JOIN operations based on logical relationships and domain knowledge.
  • Harmonized Data Format: Enforced consistency by standardizing data types, units, and character formats through targeted UPDATE statements.

Exploratory Data Analysis -

  • Unveiled Life Expectancy Trends: Dissected life expectancy patterns across countries and years, calculating minimum, maximum, and average values using MIN, MAX, AVG, GROUP BY, and HAVING functions.
  • Examined GDP Impact: Investigated the correlation between GDP and life expectancy by employing INNER JOIN and GROUP BY statements to identify potential relationships.
  • Contrasting Development Status: Compared life expectancy variations between developed and developing countries using GROUP BY and AVG clauses to uncover disparities.
  • Analyzed BMI Influence: Explored the connection between BMI and life expectancy through AVG, GROUP BY, and ORDER BY functions to identify potential trends.

Challenges

  • Navigating Data Quality Pitfalls: Encountered significant hurdles due to numerous zero values in the life expectancy column, necessitating meticulous data handling to prevent skewed findings.
  • Addressing Data Gaps: Faced challenges accurately imputing missing values, particularly within the critical life expectancy variable.
  • Overcoming Update Constraints: Initial attempts to efficiently modify the 'Status' column were hindered by subquery limitations.

Insights

  • Harnessing Self-Joins: Discovered the power of self-joins for effectively updating and imputing missing data based on related records.
  • Uncovering Economic Impact: Observed a strong correlation between GDP and life expectancy, emphasizing the pivotal role of economic factors in population health.
  • Recognizing the Power of Visualization: Realized the indispensable nature of visual tools like Tableau for uncovering hidden patterns and effectively communicating findings.

Key Findings and Recommendations

  • Economic Prosperity and Longevity: Higher GDP nations generally exhibit greater life expectancy, highlighting the socioeconomic determinants of health.
  • Development Disparities: Developed countries demonstrate significantly higher life expectancies compared to their developing counterparts.
  • Prioritizing Data Quality: Addressing data quality issues, such as zero values and missing data, is paramount for robust analysis and reliable conclusions.

About

This project dissected global life expectancy patterns using MySQL. By scrutinizing and cleaning the dataset, uncovered strong correlations between life expectancy, GDP, and development status. Key findings highlight the significant impact of economic factors on population health and the need for robust data quality in analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors