Skip to content

Use given data to create machine learning models to classify the risk level of given loans.

Notifications You must be signed in to change notification settings

jonkwiatkowski/Supervised-Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Supervised Machine Learning

Background

Lending services companies allow individual investors to partially fund personal loans as well as buy and sell notes backing the loans on a secondary market. This data will be used to determine whether a borrower is creditworthy and should be issued a loan.

You will be using this data to create machine learning models to classify the risk level of given loans. Specifically, you will be comparing the Logistic Regression model and Random Forest Classifier.

Procedure

Retrieve the data

The data is located in the Resources folder.

  • lending_data.csv

Import the data using Pandas.

Consider the models

The following prediction was made as to whether a Logistic Regression model or a Random Forest model would perform better when fit to the given data.

Prediction

This dataset is already preprocessed. There are duplicate values but that makes sense in the scope of this problem. It is a little suspicious that there are so many duplicate rows but since it is theoretically possible, it's best to not drop any data. Since all of the data is numeric, Logistic Regression should perform well. I suspect Random Forests to perform slightly better since there are many features involved and the Random Forest methon generally has the edge when we're comparing more variables.

Fit a LogisticRegression model and RandomForestClassifier model

A LogisticRegression model was created, fit it to the data, and the model's score was printed. The same was done for a RandomForestClassifier. The following questions were considered.

  • Which model performed better?
  • How does that compare to your prediction?

Conclusion

Contrary to my prediction, it seems as though the logistic regression performed slightly better but only by .02%. It is likely that the random forests method would perform better after tweaking some of the parameters but since they are both receiving scores of 99% and logistic regression is much faster, it's unlikely that it would be worth it in this case.

References

  • Loan Approval Dataset (2022). Data generated by Trilogy Education Services, a 2U, Inc. brand, and is intended for educational purposes only.
  • Assignment 19 Instructions

About

Use given data to create machine learning models to classify the risk level of given loans.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published