Motivation: when you wish to save for your retirement provision, insurance companies are interested in predicting how much will you live in order to estimate how much you should pay to them. For example, if you end up living way longer than it was expected by the insurance, the insurer will have a loss.
For this project I will have a set of predictors (e.g. income, spending on healthcare, etc) to estimate life expectancy (the target). As we have a label (life expectancy) and this label is within a range (continuous data), we are dealing with a typical scenario of Supervised Machine Learning (Linear) Regression (for more details see report.pdf). I used a decision tree model from scikit-learn to estimate life expectancy with 92% accuracy
Inside the folder "predict-life-expectancy" you will find:
- code
- data_sets (sources: Kaggle/WHO and gapminder.org)
- report.pdf
- slides.pdf