Predicting House prices in King County, WA
The King County housing challenges data scientists to create a model for predicting home prices. We used the available features in addition to some engineered features in this pursuit, ultimately identifying location as the primary feature determing the price of a home.
Code : Modeling Notebook Presentation : Slide Deck
Data - Model - Results - Recommendations - Future - Project Info -
The dataset is obtained from Kaggle Housing Prediction. It includes just over 21,000 observations, each representing a home sold in King County, Washington, between May 2014 and 2015. The median price of the homes are $450,000, and 99% of the homes sold for less than $2 million.
Continuous
- Price (target)
- square feet of living space
- square feet of lot
- floors
- effective build: the number of years since the house was last renovated or built. Also, see heat map of house ages across King County; it is interesting to see the newer renovations/constructions closer to city center correlating with higher prices; gentrification?
Categorical
- Zipcode (one hot encoded; total of 70)
- Condition: 1 to 5 rating; an objective assessment of the cphysical condition of the home
- View: 0 to 4 rating; a subjective assessment of the view from the property
- Waterfront: boolean
- has basement: boolean
These features were fitted using statmodels OLS.
Explained variance: 80%
We recognized early on that location has a prominent role in the selling price of a home. Our model clarified this hypothesis through zipcode feature weights:
Zipcode | Median Price | Premium |
---|---|---|
Base | $193,094.98 | 1.00x |
98039 | $668,108.63 | 3.46x |
98004 | $587,005.76 | 3.04x |
98040 | $463,427.95 | 2.40x |
98033 | $409,361.36 | 2.12x |
This visual shows the price (size of point) by zipcode (color) for the dataset. One can visually confirm that the selling price of many houses is higher in certain zipcodes
A major feature of location is the view from the property. We discovered homes with a higher view rating are typically located on a waterfront or proximal to Seattle's central business district; these houses may provide visibility to downtown Seattle or another scene with high appeal. Additionally, the homes sell for more.
This scatter/heatmap displays the view-price relationship. The relative geography of King County can be inferred; large blue regions are bodies of water (Pugot sound), larger points are higher prices. Notice the high view ratings and prices along water and near city center.
To derive even more accurate results, we'd like to expand the project with additional data, specifically housing prices over time, quality of schools, crime metrics. Additionally, we know the kitchen is often the selling point in a house, so kitchen features would provide even further insight on a home's sellability.
Contributors : Alphonso Woodbury Joseph McHugh
Languages : Python Tools/IDE : Anaconda, Colab Libraries : pandas, matplotlib, statsmodels, sklearn
Duration : March 2020 Last Update : 06.08.2020