Skip to content

This is a house price prediction study which utilized Exploratory Data Analysis, Dealing with Missing Values, Linear Regression with LASSO and Ridge regularization to predict house prices in the Ames Housing Data Set

Notifications You must be signed in to change notification settings

ZJW-92/Ames_House_Price_Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 

Repository files navigation

Case study: Feature Engineering-- Ames house price prediction

1 Problem statement

In this case study, you will prepare Ames Housing Dataset in a csv file in a way that it is suitable for a ML algorithm. You will achieve this by first exploring the data and performing feature transformations on provided dataset of house price prediction ML problem. You are required to train a ML model by using linear regression, ridge regression and lasso regression for predicting house prices.

2 Steps

  • 2.1 Load data set
  • 2.2 Exploratory Data Analysis (EDA)
    1. Histograms

    1. Heatmap

    1. Scatterplots

scatter-view

    1. Scatter matrix

scatter_matrix-view

    1. Correlation between other features and 'SalePrice'

The target 'SalePrice' variable is highly correlated with features such as OverallQual, GrLivArea, GarageCars, GarageArea and TotalBsmtSF among others.

  • 2.3 Process dataset for ML

Steps:

    1. Handle missing values
    1. Fill nulls for 'LotFrontage' with median value calculated after grouping by 'Neighborhood'
    1. Fill nulls for 'GarageYrBlt','MasVnrArea' with 0
    1. Apply log-transform on target feature 'SalePrice'
    1. One-hot encoding

3 Train Linear Regression

Split dataset in training set (X_train, y_train) and test set (X_test, y_test)

4 Evaluate Linear Regression model

R^2 score on trainig set: 0.94609, MSE score on trainig set: 0.00808

R^2 score on test set: 0.89136, MSE score on test set: 0.01472

linear_regression-view

5 Model refinement with Ridge regression and Lasso regression

Ridge regression (alpha=0.05): R^2 score on training set: 0.94598, R^2 score on test set: 0.89410

Lasso regression (alpha= 0.0001): R^2 score on trainig set: 0.94169, R^2 score on test set: 0.90843

6 Conslusion:

6.1 In practice, ridge regression is usually the first choice between two models.

6.2 However, if you have a large amount of features and expect only a few of them to be important, Lasso might be a better choice.

R^2 score Linear Regression Ridge Regression Lasso Regression
training set 0.94609 0.94598 0.94169
test set 0.89136 0.89410 0.90843

About

This is a house price prediction study which utilized Exploratory Data Analysis, Dealing with Missing Values, Linear Regression with LASSO and Ridge regularization to predict house prices in the Ames Housing Data Set

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published