Badge source
This project will follow the Business Analysis (BA) workflow to address house price prediction using linear regression techniques. The business problem is creating a regression model that can accurately predict house prices based on the provided features. Therefore, real estate agents can utilize this model to evaluate the property.├── Image
│
├── Code_USA_House_Price_Prediction.ipynb <- code
├── LICENSE <- MIT license
├── README.md <- read me
├── housing_price_dataset.csv <- dataset
├── USA_House_Price_Prediction_Report <- presentation
The goal of this project is to develop a predictive model for housing prices based on various property attributes, including area, number of bedrooms and bathrooms, etc.
The housing dataset was loaded via Colab. The dataset is from Kaggle: https://www.kaggle.com/datasets/muhammadbinimran/housing-price-prediction-data (also please see housing_price_dataset.csv attached). Basic data analysis was performed to identify the shape of data, get column names, find missing values, and generate descriptive statistics. The pair plot demonstrated the relationship between variables. The distribution of the target variable was shown.
- Data Dictionary
Name | Modeling Role | Measurement Level | Description |
---|---|---|---|
SquareFeet | input | int | Square Feet of the house |
Bedrooms | input | int | Amount of bedrooms |
Bathrooms | input | int | Amount of bathrooms |
Neighborhood | input | obj | Area neighborhood where the house is |
YearBuilt | input | int | Which year it was built |
Price | input | boolean | The price of the home |
- Define variables (X and y)
- Split the data into train and test datasets; 40% of test data
train: 30000 data
test: 20000 data
Several machine learning models were included:
- Linear Regression
MSE in test data: 2193848415.96 R2 score: 0.57