Skip to content

RickFSA/Lending_Club_Default_Prediction

Repository files navigation

This repository consisted a Machine Learning Model(Predictive Analysis) to predict the default rate of Lending Club. Lending Club is an American peer-to-peer lending platform connecting investor to borrower. The dataset has 396,000 observations ranging from 2007 to 2016 with data imbalanced 1-5 in favor of Fully Paid. In 2019, default borrower wiped off roughly $811 million USD from Lending Club's investors.
Since this is an imbalanced data on classification problem. The data preprocessing included Robust Scale, Standarization, QuantileTransform, SMOTENC, ADASYN and under-sampling to feed to predictive models included: LogisticRegression, AdaptiveBoosting, RandomForest, Neural Network and Extreme Gradient Boost. Overall the Adaptive Boosting seems to performed better than other models by Recall Metrics(aka. correctly classify default borrower-minimise False Negative). However, Hyperparameters and Probability Calibration provided a better result in term of F1 score and roc curve. End Notes: The million dollar question is which side should the company endorse in the trade-off, for this model, it is the trade-off between investor's return and company profitability. Key finding: Revolving Line Utilization Rate, DTI, Interest rate, Grade, Employment length, total number of credit lines are key indicators for default.

About

Classify default borrowers from initial loan application for Lending Club

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published