Skip to content

AvonleaFisher/Predicting-King-County-House-Prices-with-Multiple-Regression-Analysis

 
 

Repository files navigation

Predicting King County House Prices with Multiple Regression Analysis

Author: Avonlea Fisher

Blog Post: https://medium.com/analytics-vidhya/predicting-king-county-house-prices-with-multiple-linear-regression-84de5feeafb2

Problem:

The aim of this analysis is to build a multiple regression model that can predict house prices with the greatest accuracy possible. The results can inform home owners interested in selling their homes about the most important factors to consider for improving sale prices.

Data

The King County Housing Data Set contains information about the size, location, condition, and other features of houses in Washington's King County. The dataset and variable descriptions can be found on Kaggle.

Methods

After exploring and preprocessing the data, simple and multiple linear regression models were built in OLS statsmodels, with price as the dependent variable.

Results

Together, square footage, grade and bathrooms are the best predictors of a house's price in King County. These features were included in the final multiple regression model. The model satisfied all multiple regression assumptions, and p-values for each predictor variable were below .05. The r-squared value of the model was .529.

Features were selected based on the strength of their linear relationship with price, and multicollinear features were excluded.

Recommendations

Homeowners interested in selling their homes should focus on improving the design and quality of construction of their homes, which may in turn improve their home grade. If possible, they should also expand the square footage of living space on the lot, perhaps by building additional bathrooms. The square footage of neighbors' living space is also a strong positive predictor of price, but homeowners likely have less control over this factor. Homeowners may further increase the sale price of their homes by encouraging neighbors to also expand the square footage of their living space.

Limitations and Next Steps

The model does have some limitations: given that some of the variables needed to be log-transformed to satisfy regression assumptions, any new data used with the model would have to undergo similar preprocessing. Additionally, given regional differences in housing prices, the model's applicability to data from other counties may be limited. Given that outliers were removed, the model may also not accurately predict extreme values. Future analysis should explore the best predictors of the prices of homes outside of King County, as well as homes with extreme price values.

For further information

Please consult the King County Residential Glossary of Terms for further information about the data. For inquiries about this project, please contact me at fisheravonlea@gmail.com or via my LinkedIn profile.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%