Credit Risk Analysis

Overview

Using credit card data from Lending Club, 6 different machine learning models were compared to determine which was best at predicting credit risk. As the dataset has a greater number of low-risk loans than high-risk loans, the models used a variety of techniques (oversampling, undersampling, combination sampling, and ensemble learning) to handle the unbalanced classes.

Results

Naive Random Oversampling
- Balanced Accuracy Score: 0.63
- Precision: This model is highly precise when predicting low-risk users, but very imprecise when predicting high-risk users. This means that when it predicts a user is low-risk it will be correct 100% of the time, but when it predicts a user is high-risk it will only be accurate 1% of the time.
- Recall: This model will correctly identify 69% of low-risk users and 57% of high-risk users.

SMOTE Oversampling
- Balanced Accuracy Score: 0.65
- Precision: This model is also highly precise when predicting low-risk users but very imprecise when predicting high-risk users. When it predicts a user is low-risk it will be correct 100% of the time, but when it predicts a user is high-risk it will only be correct 1% of the time.
- Recall: This model will correctly identify 65% of low-risk users and 66% or high-risk users.

Cluster Centroid Undersampling
- Balanced Accuracy Score: 0.52
- Precision: This is another model that is highly precise when predicting low-risk users but very imprecise when predicting high-risk users. When it predicts a user is low-risk it will be correct 100% of the time, but when it predicts a user is high-risk it will only be correct 1% of the time.
- Recall: This model will correctly identify 43% of low-risk users and 60% of high risk users.

Combination Sampling With SMOTEENN
- Balanced Accuracy Score: 0.64
- Precision: When this model predicts that a user is low-risk, it will be correct 100% of the time. When it predicts a user is high-risk, though, it will only be correct 1% of the time.
- Recall: This model will correctly identify 58% of low-risk users and 70% of high-risk users.

Balanced Random Forest Classifier
- Balanced Accuracy Score: 0.78
- Precision: This model is also highly precise for low-risk users, and while it's still not very precise for high-risk users, it's better than its predecessors. When it predicts a user is low-risk it will be correct 100% of the time, and when it predicts a user is high-risk it will be correct 3% of the time.
- Recall: This model has a greater average recall than any of the undersampling/oversampling methods, and will correctly identify 89% of low-risk users and 68% of high-risk users.

Easy Ensemble AdaBoost Classifier
- Balanced Accuracy Score: 0.93
- Precision: This model has the highest precision for both classes of all the models tested. When it predicts a user is low-risk it will be correct 100% of the time, and when it predicts a user is high-risk it will be correct 7% of the time.
- Recall: This model has the highest recall for both classes of all the models tested. It will correctly identify 94% of low-risk users and 91% of high-risk users.

Summary

Oversampling, undersampling, and combination sampling methods all had identical precision scores. The ensemble models performed slightly better for high-risk users, with the Easy Ensemble method producing the highest score of all.

Average recall scores of ensemble models were also higher than oversampling, undersampling, and combination models. The Easy Ensemble model outperformed all other models across both classes.

As the Easy Ensemble model had the highest scores for both precision and recall across both classes, it is recommended that it be used moving forward.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Images		Images
Resources		Resources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Images

Images

Resources

Resources

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

credit_risk_ensemble.ipynb

credit_risk_ensemble.ipynb

credit_risk_resampling.ipynb

credit_risk_resampling.ipynb

Repository files navigation

Credit Risk Analysis

Overview

Results

Summary

About

Languages

License

Kenner82/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Credit Risk Analysis

Overview

Results

Summary

About

Topics

Resources

License

Stars

Watchers

Forks

Languages