Credit Risk Analysis

Purpose

The purpose of this analysis was to develop multiple supervised machine learning models to predict credit risk and determine which one(s) performed the best. The models were developed using different techniques to resolve the class imbalance issues; the number of low risk loans greatly outnumbered the number of high risk loans in the dataset.

Results

Random Oversampling

Balanced accuracy score:0.657
Precision: 0.01
Recall: 0.71

SMOTE Oversampling

Balanced accuracy score: 0.662
Precision: 0.01
Recall: 0.63

Cluster Centroids Undersampling

Balanced accuracy score: 0.544
Precision: 0.01
Recall: 0.69

SMOTEEENN Combination Sampling

Balanced accuracy score: 0.645
Precision: 0.01
Recall: 0.72

Balanced Random Forest Classifier

Balanced accuracy score: 0.789
Precision: 0.03
Recall: 0.70

Easy Ensemble AdaBoost Classifier

Balanced accuracy score: 0.932
Precision: 0.09
Recall: 0.92

Based on the balanced accuracy scores, the AdaBoost algorithm performed the best in terms of accurately predicting the classes, and the cluster-based undersampling technique performed the worst.

All the models have fairly low precision scores when it comes to high risk applications, which indicates there were many false positives.

The model that used the AdaBoost algorithm has the the best recall score (0.92) when it comes to high risk applications. On the other hand, the model produced using the SMOTE oversampling technique has the worst recall score (0.63). The recall score of the AdaBoost model is very high indicating that there weren't many false negatives.

Summary

Out of the 6 machine learning models, the model that used the AdaBoost algorithm has the best score when it comes to balanced accuracy, precision and recall.

Although it has a fairly low precision score, it has a high recall score. In this situation, recall is more important than precision because the lending company doesn't want to lose money by giving loans to people who are high risk and more likely to default on their loans. On the other hand, the company may miss out on potential opportunities by rejecting good loans.

Therefore, I recommend that revisions be made to see if additional changes can be made to increase the precision score of the model without significantly decreasing accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

credit_risk_ensemble.ipynb

credit_risk_ensemble.ipynb

credit_risk_resampling.ipynb

credit_risk_resampling.ipynb

Repository files navigation

Credit Risk Analysis

Purpose

Results

Random Oversampling

SMOTE Oversampling

Cluster Centroids Undersampling

SMOTEEENN Combination Sampling

Balanced Random Forest Classifier

Easy Ensemble AdaBoost Classifier

Summary

About

Releases

Packages

Languages

teresa-le/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Credit Risk Analysis

Purpose

Results

Random Oversampling

SMOTE Oversampling

Cluster Centroids Undersampling

SMOTEEENN Combination Sampling

Balanced Random Forest Classifier

Easy Ensemble AdaBoost Classifier

Summary

About

Topics

Resources

Stars

Watchers

Forks

Languages