Skip to content

agno-nymous/Santander-Customer-Satisfaction

Repository files navigation

Santander Customer Satisfaction

This is a self case study based on a Kaggle competition.

Customer satisfaction is one of the most important key performance indicators in every company today and is seen as a key element of a company's success. Unhappy customers don't stick around. What's more, unhappy customers rarely voice their dissatisfaction before leaving. Santander is a Spanish multinational corporation bank and financial based company which operates in Europe, North and South America, and also Asia. In this Kaggle competition that is conducted by Santander we need to predict whether a customer is dissatisfied with their services early on based on the features provided by the company. This will help them to take proactive steps to improve the customer satisfaction before the customer leaves.

Here I have removed sparse features, features that had high correlation with each other and also features that had low correlation with dependant variable ("TARGET"). I created 6 datasets (created more than 100 features (at least) for the 6 datasets) and applied logistic regression, decision trees, random forest, XGBoost and Lightgbm.

My detailed approach can be viewed in this medium article.

Results

Sl No. Model Kaggle Public score (AUC)
1. Ensembling(Average of two best models) 0.82746
2. log re(top 250) xgb 0.82734
3. normal re(top 250) xgb 0.82713
4. Stacking (logistic Regression) 0.82310
5. normal (top 250) xgb 0.81952
6. log ohe xgb 0.81851
7. normal ohe(top 250) xgb 0.81560
8. log xgb 0.81131

Contents of the Code Files are given below :-

Code File Description
EDA.ipynb Exploratory Data Analysis
Feature_Engineering.ipynb Feature Engineering
final.ipynb Function 1 - takes input X, returns prediction Y,Function 2 - takes input (X,Y) returns evaluation metric (AUC)
Final_ensembling.ipynb Final ensembling,stacking models and data interpretation
Modelling_LogOHE.ipynb Modelling experiments on Log transformed One Hot Encoded Dataset
Modelling_Log_RE.ipynb Modelling experiments on Log transformed Response Encoded Dataset
Modelling_Normal.ipynb Modelling experiments on Normal Dataset
Modelling_Normal_OHE.ipynb Modelling experiments on Normal One Hot Encoded Dataset
Modelling_Normal_RE.ipynb Modelling experiments on Normal Response Encoded Dataset
modelling_logTransformed.ipynb Modelling experiments on Log Transformed Dataset