Skip to content

JoeyGideon/regression_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Final Project for Regression

Project Goals Construct an ML Regression model that predicts propery tax assessed values ('taxvaluedollarcnt') of Single Family Properties using attributes of the properties.

Find the key drivers of property value for single family properties. Some questions that come to mind are:

Why do some properties have a much higher value than others when they are located so close to each other? Why are some properties valued so differently from others when they have nearly the same physical attributes but only differ in location? Is having 1 bathroom worse for property value than having 2 bedrooms? Deliver a report that the data science team can read through and replicate, understand what steps were taken, why and what the outcome was.

Make recommendations on what works or doesn't work in predicting these homes' values.

Project Description

Find what drives property value and how to best predict propery values via zillow data.

Project planning (lay out your process through the data science pipeline) Aquire data from zillow dataframe make a local csv file. View the data to see what I am working with. Prep the data to get rid of nulls and such. Split the data into train validate and test. Scale the data if needed. Explore the train dataset using visuals and stats tests to see if my hypothosis are true or not. Evaluate the data using regression models and feature engineering. Model the data and use it on the test dataset. Create the presentation I am going to give.

Initial hypotheses and/or questions you have of the data, ideas Fips has a direct correlation with property value. calculatedfinishedsquarefeet has a direct correlation with property value.

Data dictionary

Feature Definition Data Type
id row index number, range: 0 - 2985216 int64
parcelid Unique numeric id assigned to each property: 10711725 - 169601949 int64
bathroomcnt Number of bathrooms a property has: 0 - 32 float64
bedroomcnt Number of bedrooms a property has: 0 - 25 float64
calculatedfinishedsquarefeet Number of square feet of the property: 1 - 952576 float64
fips [(FIPS)] Five digit number of which the first two are the FIPS code of the state to which the county belongs. Leading 0 is removed from the data: 6037=Los Angeles County, 6059=Orange County, 6111=Ventura County float64
lotsizesquarefeet The land the property occupies in squared feet : 100 - 371000512 float64
propertylandusetypeid Unique numeric id that identifies what the land is used for: the 261=Single Family Residential, 262=Rural Residence, 273=Bungalow float64
roomcnt Total number of rooms in the principal residence float64
yearbuilt Year the property was built float64
transactiondate The most recent date the property was sold: yyyy-mm-dd object
Target Definition Data Type
taxamount The total property tax assessed for that assessment year float64
taxvaluedollarcnt The total tax assessed value of the parcel float64

Instructions or an explanation of how someone else can reproduce your project and findings (What would someone need to be able to recreate your project on their own?)

USE CORRECT IMPORTS HAVE BASIC KNOWLEDGE OF CODING AQUIRE DATA LOOK DATA OVER EXPLORE DATA MAKE VIZUALS AND RUN STATS TESTS AS NEEDED DECIDE WHICH FEATURES YOU WANT TO USE IN MODELING GET A BASELINE CREATE MODELS RUNNING THEM ON THE TRAIN AND VALIDATE DATASETS CHOOSE THE BEST MODEL AND RUN IT ON THE TEST DATASET COME UP WITH A SUMMARY AND RECOMMENDATIONS

Key findings, recommendations, and takeaways from your project.

Summary and Recommendations

You can see that using fips(county), yearbuilt, and calculatedfinishedsquarefeet created a model slightly better than baseline, but not by much.

We may be able to use this model to predict property value slightly but I wouldn't use it if it's purpose is to make revenue in anyway.

I would recommend trying to locate the house more accuratly/precise than using fips because even within counties you have very high cost areas, vs very low cost areas from block to block.

There are also many other attributes that contribute to property value, like flood/fire areas, or even what view the property may or may not have.

I think in the future this can be tailored in a MLM from block to block somehow to give a better prediction of property value.

About

Final Project for Regression

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors