Skip to content

Why do employees leave? This project first compares the predictive performance of three different models, then uses the best model to help reveal the top contributing factors.

jsgersing/IBM-Attrition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

IBM Attrition

For this notebook, I use the IMB Attrition dataset (available on Kaggle) to do the following:

  • Compare the predictive performance of Logistic Regression, Random Forest, and XGBoost models using Confusions Matrices and ROC Curves
  • Explore relative feature importance with respect to employee attrition for the XGBoost model using Gini Importances and Permutation_Importance
  • Further explore feature importance through visualization tools PDP Plots, PDP Interaction Plots, and Shapley Waterfall Plots
  • Use insights from the feature importances to perform dimensionality reduction
  • Rerun models on smaller feature matrix

Insights

  • The three models performed similarly on the original feature matrix, with the XGBoost slightly edging out the others
  • Some important features included 'OverTime', 'NumberofCompaniesWorked', and 'BusinessTravel'
  • After simplifying the model by reducing dimensionality, the simple Logistic Regression outperformed the others, which suffered from overfitting to the training data

About

Why do employees leave? This project first compares the predictive performance of three different models, then uses the best model to help reveal the top contributing factors.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published