Supervised Machine Learning

Background

Lending services companies allow individual investors to partially fund personal loans as well as buy and sell notes backing the loans on a secondary market. This data will be used to determine whether a borrower is creditworthy and should be issued a loan.

You will be using this data to create machine learning models to classify the risk level of given loans. Specifically, you will be comparing the Logistic Regression model and Random Forest Classifier.

Procedure

Retrieve the data

The data is located in the Resources folder.

lending_data.csv

Import the data using Pandas.

Consider the models

The following prediction was made as to whether a Logistic Regression model or a Random Forest model would perform better when fit to the given data.

Prediction

This dataset is already preprocessed. There are duplicate values but that makes sense in the scope of this problem. It is a little suspicious that there are so many duplicate rows but since it is theoretically possible, it's best to not drop any data. Since all of the data is numeric, Logistic Regression should perform well. I suspect Random Forests to perform slightly better since there are many features involved and the Random Forest methon generally has the edge when we're comparing more variables.

Fit a LogisticRegression model and RandomForestClassifier model

A LogisticRegression model was created, fit it to the data, and the model's score was printed. The same was done for a RandomForestClassifier. The following questions were considered.

Which model performed better?
How does that compare to your prediction?

Conclusion

Contrary to my prediction, it seems as though the logistic regression performed slightly better but only by .02%. It is likely that the random forests method would perform better after tweaking some of the parameters but since they are both receiving scores of 99% and logistic regression is much faster, it's unlikely that it would be worth it in this case.

References

Loan Approval Dataset (2022). Data generated by Trilogy Education Services, a 2U, Inc. brand, and is intended for educational purposes only.
Assignment 19 Instructions

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Resources		Resources
.gitignore		.gitignore
Credit Risk Evaluator.ipynb		Credit Risk Evaluator.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resources

Resources

.gitignore

.gitignore

Credit Risk Evaluator.ipynb

Credit Risk Evaluator.ipynb

README.md

README.md

Repository files navigation

Supervised Machine Learning

Background

Procedure

Retrieve the data

Consider the models

Prediction

Fit a LogisticRegression model and RandomForestClassifier model

Conclusion

References

About

Releases

Packages

Languages

jonkwiatkowski/Supervised-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Supervised Machine Learning

Background

Procedure

Retrieve the data

Consider the models

Prediction

Fit a LogisticRegression model and RandomForestClassifier model

Conclusion

References

About

Topics

Resources

Stars

Watchers

Forks

Languages