individual_project

First project using data and model I chose.

Description

As a Data Scientist, I undertook a project focused on predictive modeling for loan default using data imported from Coursera. The primary objective was to build a machine learning model that could accurately predict whether a loan applicant would default or not. By doing so, we aimed to help financial institutions make more informed lending decisions and mitigate potential losses. This project involved extensive data preparation, preprocessing, feature engineering, and model evaluation to achieve meaningful insights. As a result I was able to make recommendations to minimize loan defaults.

Goal

The purpose of this model is to predict borrowers that default.
My goal is to find specific features that drive defaults.

Initial hypotheses

Null Hypothesis: Features do not drive borrowers to default
Alternative Hypothesis: Features drive borrowers to default

Data dictionary

Column	Column_type	Data_type	Description
LoanID	Identifier	string	A unique identifier for each loan.
Age	Feature	integer	Age of the borrower.
Income	Feature	integer	Annual income of the borrower.
LoanAmount	Feature	integer	Amount of money being borrowed.
CreditScore	Feature	integer	Credit score of the borrower, indicating their creditworthiness.
MonthsEmployed	Feature	integer	Number of months the borrower has been employed.
NumCreditLines	Feature	integer	Number of credit lines the borrower has open.
InterestRate	Feature	float	Interest rate for the loan.
LoanTerm	Feature	integer	Term length of the loan in months.
DTIRatio	Feature	float	Debt-to-Income ratio, borrower's debt compared to their income.
Education	Feature	string	Highest level of education attained by the borrower.
EmploymentType	Feature	string	Type of employment status of the borrower.
MaritalStatus	Feature	string	Marital status of the borrower (Single, Married, Divorced).
HasMortgage	Feature	string	Whether the borrower has a mortgage (Yes or No).
HasDependents	Feature	string	Whether the borrower has dependents (Yes or No).
LoanPurpose	Feature	string	Purpose of the loan (Home, Auto, Education, Business, Other).
HasCoSigner	Feature	string	Whether the loan has a co-signer (Yes or No).
Default	Target	integer	Binary target variable indicating whether the loan defaulted (1) or not (0).

Planning:

Generate questions to ask about the data set based off of what I want my model to predict. Do any features have an impact on defaults?. What features significantly drive defaults?
Determine the format. Final report should be in .ipynb, Modules should be in .py, Predictions should be in .csv.
Determine audience and develop speech and presention accordingly. Audience will be lead data scientist.
Determine significance between features and defaults.
Develop my null hypothsisis and alternative hypothesis.
Determine what model to create

Acquisition:

Data acquired from Coursera into a csv file

Preparation

Renamed columns& lowercased column names
No missing values
Dropped LoanID column
Split data 70%,15%,15%

Exploration & pre-processing:

Made visuals and used stats to understand which features had a significance
Binned data for better visuals

Modeling:

Decision tree and random forest models with balanced weight parameters perform worse than the baseline
Distribution of default binary values heavily concentrated on one value
Knearest tree is weighing one outcome significantly more than the other

Delivery:

Deployed my model and a reproducable report
Made recommendations

Key findings, recommendations, and takeaways

Distribution of defaults significantly concentrated on non defaults (0)
Interest rates, loan amount, and age seem to drive borrrowers to default on loans
Target loan amounts lowers than 150k
Require higher qualifications for younger population
Target borrowers with low interest rates

Instructions or an explanation of how someone else can reproduce project and findings

Enviroment setup:

Install Conda, Python, MySql, VS Code or Jupyter Notebook
Clone this repo

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Final_Report-Copy1.ipynb		Final_Report-Copy1.ipynb
Final_Report.ipynb		Final_Report.ipynb
README.md		README.md
Untitled.ipynb		Untitled.ipynb
explore.py		explore.py
modeling.py		modeling.py
preprocess.py		preprocess.py
scratch1.ipynb		scratch1.ipynb
scratch2.ipynb		scratch2.ipynb
scratch3.ipynb		scratch3.ipynb
scratch4.ipynb		scratch4.ipynb
wrangle.py		wrangle.py
wrangle2.py		wrangle2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

individual_project

Description

Goal

Initial hypotheses

Data dictionary

Planning:

Acquisition:

Preparation

Exploration & pre-processing:

Modeling:

Delivery:

Key findings, recommendations, and takeaways

Instructions or an explanation of how someone else can reproduce project and findings

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

individual_project

Description

Goal

Initial hypotheses

Data dictionary

Planning:

Acquisition:

Preparation

Exploration & pre-processing:

Modeling:

Delivery:

Key findings, recommendations, and takeaways

Instructions or an explanation of how someone else can reproduce project and findings

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages