Skip to content

Allstate, a personal insurer in United States has a lot of insurance claims registered with them each day. The purpose of this paper is to use the currently available registered claims within Allstate’s database to analyze the data and building a statistical model to predict the Loss incurred for a new accident which is not yet registered. This …

Notifications You must be signed in to change notification settings

varshini24/Allstate-Claims-Severity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Allstate-Claims-Severity

Allstate, a personal insurer in United States has a lot of insurance claims registered with them each day. The purpose of this paper is to use the currently available registered claims within Allstate’s database to analyze the data and building a statistical model to predict the Loss incurred for a new accident which is not yet registered. This way, Allstate can improve the claim service without wasting the time or mental energy of their customers who were already devastated.

Since the response variables “Loss” here is a continuous variable, we need to convert the categorical variables into numerical before analyzing the data. Hence, this is a Regression problem. I started with converting the categorical variables to numerical variables with the help of Label Encoder. Eventually, I tried to remove unwanted predictors with help of Recursive Feature Elimination. The, I tried to apply different statistical methods to fit the data properly maintaining the bias-variance tradeoff. Among all the methods I have used, say, Ridge Regression, Lasso, Decision Trees, Bagging, Boosting, Stochastic Gradient Boosting and XGBoost, XGBoost gave the highest performance in predicting the Loss incurred by an insurance claim.

The data provided is taken from the Kaggle’s ongoing competition (Allstate Claim Severity). It is divided into two categories – train and test data. The train data has 188318 observations whereas the test data has 125546 observations. Irrespective of the number of observations we have, the data is further divided into categorical and continuous variables. The data is custom encoded into categorical and continuous names and given to us because of the security issues. So, in total there are 116 categorical and 14 continuous variables in both training and test sets. There is an additional “Loss” variable available in the train set which explains the amount of loss incurred for an accident after claiming insurance with Allstate with the help of all the predictors given.

About

Allstate, a personal insurer in United States has a lot of insurance claims registered with them each day. The purpose of this paper is to use the currently available registered claims within Allstate’s database to analyze the data and building a statistical model to predict the Loss incurred for a new accident which is not yet registered. This …

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published