Skip to content

AryafAlotaibi/Income-Prediction-Using-Machine-Learning-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Income Prediction Using Machine Learning Models

Project Overview

This project aims to predict whether an individual’s annual income exceeds $50,000 using demographic and employment-related features from the Adult (Census Income) dataset. The task is formulated as a binary classification problem and serves as a benchmark for evaluating classical machine learning algorithms on structured tabular data.

This project was created as part of a machine learning course and completed collaboratively as a team project.

To address this problem, three supervised learning models were implemented and compared: Support Vector Machine (SVM), Decision Tree, and K-Nearest Neighbors (KNN). The goal is to assess how different modeling approaches perform in terms of accuracy and class-level prediction quality.

Why Income Prediction Matters

  • Supports socioeconomic and workforce analysis
  • Helps identify factors associated with income levels
  • Provides a standard benchmark for evaluating ML classifiers

Dataset (Adult Census Income)

The Adult dataset was introduced by Becker and Kohavi (1996) and is hosted by the UCI Machine Learning Repository. It contains demographic and employment-related attributes such as age, education, occupation, and hours worked per week. The target variable is Income, formulated as a binary classification task with the following two classes:

1- <=50K: Individuals earning less than or equal to $50,000 per year.
2- >50K: Individuals earning more than $50,000 per year.

Dataset Source: https://doi.org/10.24432/C5XW20.

Data Preprocessing

To ensure consistency and avoid data leakage, training and test datasets were temporarily combined during preprocessing and later separated. The main preprocessing steps included:

  • Handling missing values using the most frequent category
  • Encoding categorical variables into numeric form
  • Applying Min–Max normalization to scale features These steps were particularly important for distance-based models such as SVM and KNN.

Model Architecture and Methods

Three machine learning models were trained and evaluated:

  • Support Vector Machine (RBF kernel) for margin-based classification
  • Decision Tree Classifier for interpretable rule-based learning
  • K-Nearest Neighbors (KNN) for distance-based classification

Model performance was evaluated using accuracy, precision, recall, F1-score, and confusion matrices.

Results

The performance of the three machine learning models was evaluated using accuracy on the test dataset.

Model Accuracy
Support Vector Machine (SVM) 0.8483
Decision Tree 0.8448
K-Nearest Neighbors (KNN) 0.8284

The SVM model achieved the highest accuracy, followed closely by the Decision Tree, while KNN showed slightly lower performance.

References

About

For a more detailed implementation, please refer to the pdf file

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages