Comparing Classifiers

Comparing the performance of k-nearest neighbors, logistic regression, decision trees, and support vector machines This is a homework submission for a machine learning course

The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required. The classification goal is to predict if the client will subscribe (yes/no) a term deposit (variable y).

There are two datasets:

bank-full.csv with all marketing data and 17 inputs, ordered by date
bank.csv with a 10%, randomly selected dataset from the bank-full.csv

I used the sample of this data for the computational demanding vector machine classification.

Variable details

Below is a detail of each variable in the dataset. Each row represents a unique customer and serves as the unique identifier for each individual.

Variable Name	Data type	Description
Age	Integer
Job	Categorical	Occupation, ex: admin, blue-collar, retired, unemployed
Marital	Categorical	Married, single, divorced
Education	Categorical	Tertiary, secondary, primary, unknown
Default	Binary	Credit in default? yes or no
Balance	Integer	Average yearly balance
Housing	Binary	Has housing loan? yes or no
Loan	Binary	Has personal loan? yes or no
Contact	Categorical	Contact communication type (cellular, telephone)
Day	Integer	Day of the week of last contact
pdays	Integer	Number of days since last contact (-1 means no contact)
Previous	Integer	Number of contacts performed before this campaign
poutcome	Categorical	Outcome of previous campaign: unknown, failure, success

Preparing the data

All of the categorical data was converted into numeric values using the Label encoder. Upon analyzing the "poutcome" data values, I noticed that there was repetition within the outcome values. I converted this column to boolean, making success = 1, and non-success (other, unknown, failure) = 0. Below you can see the comparison of the conversion of these values as they relate to the loan success.

Selecting the best model

THe KNN classifier (13 neighbors and Euclidian metric) was 89.2% accurate. The logistic regression classifier was 88% accurate. The decision tree classifier was 88% accurate. The support vector classifier (using the linear kernel) was 89.1% accurate.

The KNN classifier performed the best, followed closely by the support vector classifier. The logistic regression and decision tree classifiers performed the worst, although by a small margin.

Going forward, I recommend using the KNN classifier to determine which customers will accept the terms of the deposit.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Data		Data
Images		Images
.DS_Store		.DS_Store
.gitattributes		.gitattributes
Comparing_Classifiers.ipynb		Comparing_Classifiers.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparing Classifiers

Variable details

Preparing the data

Selecting the best model

Visualizing the Classifiers

KNN Classifier

Logistic Regression Classifier

Decision Tree Classifier

Support Vector Classifier

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

BotanicalAmy/Comparing-Classifiers

Folders and files

Latest commit

History

Repository files navigation

Comparing Classifiers

Variable details

Preparing the data

Selecting the best model

Visualizing the Classifiers

KNN Classifier

Logistic Regression Classifier

Decision Tree Classifier

Support Vector Classifier

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages