Skip to content

manoharpavuluri/salary-prediction--LNR-GBR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Salary Prediction using Linear Regression and Gradient Boosting Regressor

Problem -

Predict salaray based on multiple features.

Data

What we have

  • We have 2 files - Train and Test File.
  • Train file has 100k observations with 7 features
  • 4 categorical and 2 numerical data

We have data as in below alt tex

Data Preparataion

We ran through data processing to look for following

  • Nulls
  • Data types to see if numerical columns are marked as object
  • how many categorical and numericals columns in the dataframe

Feature Engineering

Hot Encoding the categorical values

Used hot encoding to convert the categorical values to numerical values as below, as the models only work on the numerical columns alt tex

Correlation features to Salary

Evaluated the correlation to see which featured need to be considered. alt tex

alt tex

alt tex

alt tex

alt tex

alt tex

alt tex

From the Correlation, Company ID doenst have impact on Salary, so will be ignored.

Model

Evaluated 2 models - Linear Regression and Gradient Boosting Regressor

Linear Regression

Predicted VS Real Plot

alt tex

MSE Evaluation

alt tex

Gradient Boosting Regressor

Predicted VS Real Plot

alt tex

MSE Evaluation

alt tex

Conclussion

Although Predicted VS Real plots looks same, from further evaluations MSE numbers, GBR seems to be better model.

Using GBR, evaluated the Features to see which has more impact

alt tex

And Finally the Predicted Salaries using GBR Model

alt tex

About

Linear Regression & Gradient Boosting Regressor based

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published