Skip to content

Built a regression model to predict bike demand on data from Seoul, South Korea. and employed one hot encoding to create dummy variables Benchmarked Cat Boost against Linear regression, Lasso and Ridge regression, Gradient Boost and performed feature engineering and tuned the hyperparameters for the optimum performance

Notifications You must be signed in to change notification settings

raunaksatone5/Bike-Sharing-Demand-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Project Title : Seoul Bike Sharing Demand Prediction

image

Problem Description

Currently Rental bikes are introduced in many urban cities for the enhancement of mobility comfort. It is important to make the rental bike available and accessible to the public at the right time as it lessens the waiting time. Eventually, providing the city with a stable supply of rental bikes becomes a major concern. The crucial part is the prediction of bike count required at each hour for the stable supply of rental bikes.

Data Description

The dataset contains weather information (Temperature, Humidity, Windspeed, Visibility, Dewpoint, Solar radiation, Snowfall, Rainfall), the number of bikes rented per hour and date information.

Attribute Information:

  • Date : year-month-day
  • Rented Bike count - Count of bikes rented at each hour
  • Hour - Hour of he day
  • Temperature-Temperature in Celsius
  • Humidity - %
  • Windspeed - m/s
  • Visibility - 10m
  • Dew point temperature - Celsius
  • Solar radiation - MJ/m2
  • Rainfall - mm
  • Snowfall - cm
  • Seasons - Winter, Spring, Summer, Autumn
  • Holiday - Holiday/No holiday
  • Functional Day - NoFunc(Non Functional Hours), Fun(Functional hours)

Business Use Case:

  • A bike rental company can use a machine learning regression model to predict the demand for bikes in certain areas at specific times. This information can be used to optimize the distribution of bikes, ensuring that there are enough bikes in high-demand areas and reducing the number of bikes in low-demand areas. This can help the company to save on transportation costs and increase efficiency in bike distribution.
  • This model can also be used in similar business problems where we need to predict the deamand of certain item.

Data Pre-Processing

  • Got the descriptive information of the data.
  • checked for null values if any
  • Checked for Outliers

Exploratory Data Analysis Feature Engineering

  • Used Libraries such as matplotlib.pyplot and seaborn to visualize the data
  • Found the distribution of target variable positively skewed hence applied Square root transformation to it.
  • separated numerical and catagorical features and checked for their distribution using histplot
  • checked corelation with heatmap and found high corelation between Temperature and dew point temperature.
  • applied one hot encoding to categorical features.

Model training

  • defined funvtions to train linear and non-linear models and get their performance metrics
  • Performance metrics used were MSE,RMSE,R@ score, adjusted r2 score
  • defined a fuction for feature importance
  • splitted data into training and testing
  • Scaled training and testing sets of independent variables using StandardScaler
  • trained models such as linear regression, L! and L2 regression, KNN, Decision trees, Random forest,Gradient Boost, CatBoost.
  • Found CatBoost as the best model with Adjusted r2 score of 90%.

About

Built a regression model to predict bike demand on data from Seoul, South Korea. and employed one hot encoding to create dummy variables Benchmarked Cat Boost against Linear regression, Lasso and Ridge regression, Gradient Boost and performed feature engineering and tuned the hyperparameters for the optimum performance

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published