Boston AMES Housing Price Prediction

The Ames Housing dataset was compiled by Dean De Cock for use in data science education. It's an incredible alternative for data scientists looking for a modernized and expanded version of the often cited Boston Housing dataset.

This project was started as a motivation for learning Machine Learning Algorithms and to learn the different data preprocessing techniques such as Exploratory Data Analysis, Feature Engineering, Feature Selection, Feature Scaling and finally to build a machine learning model. We will predicts house price in boston city

DATA DESCRIPTION

The data was originally published by Harrison, D. and Rubinfeld, D.L. Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978. The dataset is collected from Kaggle. Let's get into the data and know more about it.

Origin
- The origin of the boston housing data is Natural.
Usage
- This dataset may be used for Assessment.
Number of Cases
- The dataset contains a total of 506 cases.
Order
- The order of the cases is mysterious.
Variables There are 14 attributes in each case of the dataset. They are:
1. CRIM - per capita crime rate by town
2. ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
3. INDUS - proportion of non-retail business acres per town.
4. CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
5. NOX - nitric oxides concentration (parts per 10 million)
6. RM - average number of rooms per dwelling
7. AGE - proportion of owner-occupied units built prior to 1940
8. DIS - weighted distances to five Boston employment centres
9. RAD - index of accessibility to radial highways
10. TAX - full-value property-tax rate per 10,000 dollars.
11. PTRATIO - pupil-teacher ratio by town
12. B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
13. LSTAT - % lower status of the population
14. MEDV - Median value of owner-occupied homes in 1000's dollars

DATA PREPROCESSING

Before performing Modeling, we will pre-process the dataset by conducting the following steps:

Finding Correlation between the Predictor variables -

A. Correlation matrix between SalePrice with other variables

B. SalePrice correlation matrix

Find Missing Values and impute using K-Means if necessary -

A. Computing percent of missing Values

B. Plotting the Proportion of Missing Values

Perform Outlier Detection, to remove values that can decrease the model accuracy and lead to inappropriate predictions -
A. Univariate Analysis

B. Bivariate Analysis

Analysing the target variable SalePrice - We will check the Correlation of Target variable with Prediction variables to handle Multi-Collinearity. Also, check the skewness for 'GrLivArea' and 'TotalBsmtSF'

DATA MODELING

After proceessing the data, we have implemented the following Regression models -

Linear Regression

2. Lasso Regression

3. Ridge Regression

4. Random Forest Regressor

Random Forest Regressor With different Depth Level -

Decision Tree Regressor -

MODEL EVALUATION

R-squared (R2)
Root Mean Square Error (RMSE)
Best Score
Cross-Validation Score

CONCLUSION

On the basis of our evaluation parameters calculated for each model below are the observations:

R-squared is a statistical measure of how close the data are to the fitted regression line. The higher the R-squared, the better the model fits the data --> Ridge regression (0.9285)
Root Mean Square Error (RMSE) is the standard deviation of the residuals (prediction errors). Lower the RMSE, the better the model fits the data --> Ridge regression (0.1021)
Higher the Best score the better the model fits the data --> Ridge regression (0.8857)
Higher Cross validation score means model performing well on the validation set, indicating that it may perform well on the unseen data(test set) --> Ridge regression (0.8927)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.vscode		.vscode
data		data
images		images
.gitattributes		.gitattributes
1_Boston Housing Price Prediction.ipynb		1_Boston Housing Price Prediction.ipynb
2_Boston House Price Prediction.ipynb		2_Boston House Price Prediction.ipynb
3_Linear_regression.ipynb		3_Linear_regression.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.vscode

.vscode

data

data

images

images

.gitattributes

.gitattributes

1_Boston Housing Price Prediction.ipynb

1_Boston Housing Price Prediction.ipynb

2_Boston House Price Prediction.ipynb

2_Boston House Price Prediction.ipynb

3_Linear_regression.ipynb

3_Linear_regression.ipynb

README.md

README.md

Repository files navigation

Boston AMES Housing Price Prediction

DATA DESCRIPTION

DATA PREPROCESSING

DATA MODELING

MODEL EVALUATION

CONCLUSION

About

Releases

Packages

Languages

adiag321/Boston-AMES-Housing-Price-Prediction

Folders and files

Latest commit

History

Repository files navigation

Boston AMES Housing Price Prediction

DATA DESCRIPTION

DATA PREPROCESSING

DATA MODELING

MODEL EVALUATION

CONCLUSION

About

Topics

Resources

Stars

Watchers

Forks

Languages