Income Prediction Using Machine Learning Models

Project Overview

This project aims to predict whether an individual’s annual income exceeds $50,000 using demographic and employment-related features from the Adult (Census Income) dataset. The task is formulated as a binary classification problem and serves as a benchmark for evaluating classical machine learning algorithms on structured tabular data.

This project was created as part of a machine learning course and completed collaboratively as a team project.

To address this problem, three supervised learning models were implemented and compared: Support Vector Machine (SVM), Decision Tree, and K-Nearest Neighbors (KNN). The goal is to assess how different modeling approaches perform in terms of accuracy and class-level prediction quality.

Why Income Prediction Matters

Supports socioeconomic and workforce analysis
Helps identify factors associated with income levels
Provides a standard benchmark for evaluating ML classifiers

Dataset (Adult Census Income)

The Adult dataset was introduced by Becker and Kohavi (1996) and is hosted by the UCI Machine Learning Repository. It contains demographic and employment-related attributes such as age, education, occupation, and hours worked per week. The target variable is Income, formulated as a binary classification task with the following two classes:

1- <=50K: Individuals earning less than or equal to $50,000 per year.
2- >50K: Individuals earning more than $50,000 per year.

Dataset Source: https://doi.org/10.24432/C5XW20.

Data Preprocessing

To ensure consistency and avoid data leakage, training and test datasets were temporarily combined during preprocessing and later separated. The main preprocessing steps included:

Handling missing values using the most frequent category
Encoding categorical variables into numeric form
Applying Min–Max normalization to scale features These steps were particularly important for distance-based models such as SVM and KNN.

Model Architecture and Methods

Three machine learning models were trained and evaluated:

Support Vector Machine (RBF kernel) for margin-based classification
Decision Tree Classifier for interpretable rule-based learning
K-Nearest Neighbors (KNN) for distance-based classification

Model performance was evaluated using accuracy, precision, recall, F1-score, and confusion matrices.

Results

The performance of the three machine learning models was evaluated using accuracy on the test dataset.

Model	Accuracy
Support Vector Machine (SVM)	0.8483
Decision Tree	0.8448
K-Nearest Neighbors (KNN)	0.8284

The SVM model achieved the highest accuracy, followed closely by the Decision Tree, while KNN showed slightly lower performance.

References

Becker, B., & Kohavi, R. (1996). Adult Dataset. UCI Machine Learning Repository.
https://doi.org/10.24432/C5XW20

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
MLPROJECT.pdf		MLPROJECT.pdf
README.md		README.md
mlproject.py		mlproject.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Income Prediction Using Machine Learning Models

Project Overview

Why Income Prediction Matters

Dataset (Adult Census Income)

Data Preprocessing

Model Architecture and Methods

Results

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Income Prediction Using Machine Learning Models

Project Overview

Why Income Prediction Matters

Dataset (Adult Census Income)

Data Preprocessing

Model Architecture and Methods

Results

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages