This project uses the zillow dataset on MySql through Codeup's servers and produces a regression model to predict tax value.
- The purpose of this model is to predict tax values of single family properties using features of the zillow dataset from 2017.
- My goal is to find specific features that are correlated with the tax value and build a model with lowest RMSE to predict property tax values on new data.
- Null Hypothesis:
- Alternative Hypothesis:
| Feature | Description |
|---|---|
| bedrooms | total bedrooms |
| bathrooms | total bathrooms |
| square_ft | square footage |
| tax_value | tax value |
| year | year built |
| county | location |
Questions to ask about the data set based off of what I want my model to predict:
-
Do any features have a correlation with tax value?.
-
What features significantly affect tax value?
-
Final report should be in .ipynb, Modules should be in .py, Predictions should be in .csv.
-
Audience will be lead data scientist.
-
Determine correlation between features and tax value.
-
Develop my null hypothsisis and alternative hypothesis.
-
Explore data using visuals and statistical tests.
-
Determine what model to create.
- Using a function with my credentials stored in a env.py file, I connected to codeup's MySQL server.
- I created a query pertinent to the project.
- Using a function I bult containing the query and pandas.read_sql, I called the function to acquire and read zillow data from MySQL.
- Perform tasks such as handling null values, renaming columns, normalizing text, binning of data, changing data types, mapping column values.
- Open wrangle.py file and look under clean_zillow() function for more details.
- Made visuals and used stats to understand which features had a significant correalation, relationship
- I dropped features that had no significance
- Open explore.py and preprocess.py file for more details.
- I choose a classification random forest model
- I used my train data set to train my model
- I made predictions with my model
- I made multple models and choose the best one
- Deployed my model and a reproducable report
- Made recommendations
- Monthly contracts amd charges are key features driving customer churn
- I recommend incentivizing yearly contracts by offering promotion deals and a loyalty program
Enviroment setup:
- Install Conda, Python, MySql, VS Code or Jupyter Notebook
- Clone this repo to your local