Skip to content

Karol-Gawlowski/PolicyRenewals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PolicyRenewals (work in progress)

Data preparation, EDA and modeling on an imbalanced dataset from: https://www.kaggle.com/arashnic/imbalanced-data-practice?select=aug_train.csv

In this exercise: Explore Deep Learning and Random Forests with the h2o ML package, Apply explainable AI (XAI) methods with DALEX package, Work with SMOTE and ROSE methods for upsampling and PCA, Further improve my data wrangling skills with dplyr, Perform automatic feature engineering, Fit, evaluate and compare caret models, Create more advanced ggplot2 graphics for EDA, Play around with tidyquant (excel-like functions, e.g. pivot tables),

Despite the repository name, we build a model to predict whether the policyholders (Health Insurance) from past year will also be interested in Vehicle Insurance provided by the company. [Kaggle]

Comments: Overall, this exercise would be more interesting, if there were more information on the variables, e.g. the regions, or the sales channels. In that case, some well informed and justified feature engineering could be performed. A similar point could be made when it comes to visual presentation.

About

Building binary predictors on a heavily imbalanced dataset - exercise on policy cross-selling [kaggle]

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages