Analyzing borrowers’ risk of defaulting

Chukwuemeka Okoli
Practicum by Yandex Project 1
April 2, 2021

Project description
Your project is to prepare a report for a bank’s loan division. You’ll need to find out if a customer’s marital status and number of children have an impact on whether they will default on a loan. The bank already has some data on customers’ credit worthiness.

Your report will be considered when building a credit score for a potential customer. A credit score is used to evaluate the ability of a potential borrower to repay their loan.

Guiding Question
Why do borrowers' default on making on time loan repayment?

Objectives

The objective of this project is to:

Prepare a report for a bank's loan division by analyze a borrower's risk of defaulting.
Apply Data Preprocessing to a real-life analytical case study.

Data Source

The customers' credit worthiness data is a real-life analytical case study provided by Practicum by Yandex. As a Data Scientist, we are to prepare a report for a client by analyzing the clients' customers and the risk of defaulting on a loan. Various data preprocessing steps were applied and used to analyze the borrower's risk of defaulting on a loan. The insight generated from this report is to be used when building a credit score of a potential customer.

Description of the data

children: the number of children in the family
days_employed: how long the customer has been working
dob_years: the customer’s age
education: the customer’s education level
education_id: identifier for the customer’s education
family_status: the customer’s marital status
family_status_id: identifier for the customer’s marital status
gender: the customer’s gender
income_type: the customer’s income type
debt: whether the customer has ever defaulted on a loan
total_income: monthly income
purpose: reason for taking out a loan

Technology Used

Python
Jupyter Notebook
Pandas
Numpy
Matplotlib
Seaborn
NLTK
WordNetLemmatizer
SnowballStemmer

Structure of Notebook

Open the data file and have a look at the general information
Data preprocessing

Processing missing values
Data type replacement
Processing duplicates
Categorizing Data

Answer the business question

Is there a connection between having kids and repaying a loan on time?
Is there a connection between marital status and repaying a loan on time?
Is there a connection between income level and repaying a loan on time?
How do different loan purposes affect on-time loan repayment?

Conclusion

Executive Summary

Introduction

In every business, having an idea about your customers' credit worthiness is an important metric in accessing customers' value to a business. This will later form a basis for measuring essential business metrics such as sales revenue, customer acquisition costs, estimated customer lifetime value, and customer churn. In this project, the bank’s loan division is trying to find out if a customer’s marital status and number of children have an impact on whether they will default on a loan. The goal is to apply data preprocessing and analytics in order to determine customers’ credit worthiness. The insight obtained from this project will enable the bank to determine the estimated customer lifetime value, and will be useful when building a credit score for a potential customer.

Methods

To accomplish this, I first inspected the data using the pandas library to obtain general information about the data. I processed the missing values, changed data type, and processed duplicates. Next, I proceeded to categorized data and prepare the data for further analysis. To carry out lemmatization on the purpose column, I used the WordNetLemmatizer and SnowballStemmer to extract frequency of words in purpose column. I then proceeded to encode the categorical variables. In analyzing the data, I prepared various pivot tables and plotted various visualiztion using the Matplotlib and Seaborn libraries. Analysis using pivot table was important in answering some of the business needs.

Key Findings

I created a visualization of my findings using the Seaborn library.

The following are the key findings from this analysis:

People with more than 5 kids and up to 20 kids are ~37% more likely to be in debt than people with no kid thus, there is a relationship between having kids and repaying a loan on time. This means that people with kids are likely to default on loan repayment.
Unmarried people with up to 4 kids and divorced people with up to 20 kids are ~75% more likely to be in debt than any other family status, and about 80% more likely to be in debt than people with 3 or less number of children.
Unmarried people are more than 2% likely to be in debt than married people. Widow/widower are least likely to be in debt than any of the other groups. This means unmarried people are more likely to default on loan repayment.
There is no correlation between income level and defaulting on loan payment.
People requesting a loan for car purchase and education purposes will most likely default on loan repayment. People requesting loan for house purchase make on time payment than any other category.

Deployment and Application

I plan on future deployment using Amazon Web Services. The goal is to extend the application of the project to multiple customers via web services.

Future Development

For future development, I will be working at better visualization and statistical analysis to optimize clients' customer acquisition costs determination. I would also be working on predicting customer churn using Machine Learning for the client. Future Machine Learning model will be put to production and deployed via web app.

Accomplishments

Applied strategies for dealing with missing values.
Converted data from one type to another.
Identified duplicate data and processed it in several different ways.
Categorized data.
Export final data into pivot tables.
Queried and used pivot table for data manipulation and interpretation.
Created visualizations using insights from pivot table.

This project was able to answer the business questions. Using insights from this project, the client can classify customer's according to customers’ credit worthiness.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
Data		Data
Image		Image
README.md		README.md
credit_rating_analytics.ipynb		credit_rating_analytics.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data

Data

Image

Image

README.md

README.md

credit_rating_analytics.ipynb

credit_rating_analytics.ipynb

Repository files navigation

Analyzing borrowers’ risk of defaulting

Table of contents

Objectives

Data Source

Technology Used

Structure of Notebook

Executive Summary

Accomplishments

About

Releases

Packages

Languages

chuksoo/credit_rating_analytics

Folders and files

Latest commit

History

Repository files navigation

Analyzing borrowers’ risk of defaulting

Table of contents

Objectives

Data Source

Technology Used

Structure of Notebook

Executive Summary

Accomplishments

About

Topics

Resources

Stars

Watchers

Forks

Languages