Skip to content

Practicum by Yandex Project 1: This project is prepared to analyze a borrower's risk of defaulting in on-time loan repayment by applying various data preprocessing and analytics steps in Python.

Notifications You must be signed in to change notification settings

chuksoo/credit_rating_analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Analyzing borrowers’ risk of defaulting

Chukwuemeka Okoli
Practicum by Yandex Project 1
April 2, 2021

Project description
Your project is to prepare a report for a bank’s loan division. You’ll need to find out if a customer’s marital status and number of children have an impact on whether they will default on a loan. The bank already has some data on customers’ credit worthiness.

Your report will be considered when building a credit score for a potential customer. A credit score is used to evaluate the ability of a potential borrower to repay their loan.

Guiding Question
Why do borrowers' default on making on time loan repayment?

Table of contents


Objectives

The objective of this project is to:
  • Prepare a report for a bank's loan division by analyze a borrower's risk of defaulting.
  • Apply Data Preprocessing to a real-life analytical case study.

Data Source

The customers' credit worthiness data is a real-life analytical case study provided by Practicum by Yandex. As a Data Scientist, we are to prepare a report for a client by analyzing the clients' customers and the risk of defaulting on a loan. Various data preprocessing steps were applied and used to analyze the borrower's risk of defaulting on a loan. The insight generated from this report is to be used when building a credit score of a potential customer.

Description of the data

  • children: the number of children in the family
  • days_employed: how long the customer has been working
  • dob_years: the customer’s age
  • education: the customer’s education level
  • education_id: identifier for the customer’s education
  • family_status: the customer’s marital status
  • family_status_id: identifier for the customer’s marital status
  • gender: the customer’s gender
  • income_type: the customer’s income type
  • debt: whether the customer has ever defaulted on a loan
  • total_income: monthly income
  • purpose: reason for taking out a loan

Technology Used

  • Python
  • Jupyter Notebook
  • Pandas
  • Numpy
  • Matplotlib
  • Seaborn
  • NLTK
  • WordNetLemmatizer
  • SnowballStemmer

Structure of Notebook

  1. Open the data file and have a look at the general information
  2. Data preprocessing
    • Processing missing values
    • Data type replacement
    • Processing duplicates
    • Categorizing Data
  3. Answer the business question
    • Is there a connection between having kids and repaying a loan on time?
    • Is there a connection between marital status and repaying a loan on time?
    • Is there a connection between income level and repaying a loan on time?
    • How do different loan purposes affect on-time loan repayment?
  4. Conclusion

Executive Summary

Introduction

In every business, having an idea about your customers' credit worthiness is an important metric in accessing customers' value to a business. This will later form a basis for measuring essential business metrics such as sales revenue, customer acquisition costs, estimated customer lifetime value, and customer churn. In this project, the bank’s loan division is trying to find out if a customer’s marital status and number of children have an impact on whether they will default on a loan. The goal is to apply data preprocessing and analytics in order to determine customers’ credit worthiness. The insight obtained from this project will enable the bank to determine the estimated customer lifetime value, and will be useful when building a credit score for a potential customer.

Methods

To accomplish this, I first inspected the data using the pandas library to obtain general information about the data. I processed the missing values, changed data type, and processed duplicates. Next, I proceeded to categorized data and prepare the data for further analysis. To carry out lemmatization on the purpose column, I used the WordNetLemmatizer and SnowballStemmer to extract frequency of words in purpose column. I then proceeded to encode the categorical variables. In analyzing the data, I prepared various pivot tables and plotted various visualiztion using the Matplotlib and Seaborn libraries. Analysis using pivot table was important in answering some of the business needs.

Key Findings

I created a visualization of my findings using the Seaborn library.

The following are the key findings from this analysis:

  • People with more than 5 kids and up to 20 kids are ~37% more likely to be in debt than people with no kid thus, there is a relationship between having kids and repaying a loan on time. This means that people with kids are likely to default on loan repayment.
  • Unmarried people with up to 4 kids and divorced people with up to 20 kids are ~75% more likely to be in debt than any other family status, and about 80% more likely to be in debt than people with 3 or less number of children.
  • Unmarried people are more than 2% likely to be in debt than married people. Widow/widower are least likely to be in debt than any of the other groups. This means unmarried people are more likely to default on loan repayment.
  • There is no correlation between income level and defaulting on loan payment.
  • People requesting a loan for car purchase and education purposes will most likely default on loan repayment. People requesting loan for house purchase make on time payment than any other category.

Deployment and Application

I plan on future deployment using Amazon Web Services. The goal is to extend the application of the project to multiple customers via web services.

Future Development

For future development, I will be working at better visualization and statistical analysis to optimize clients' customer acquisition costs determination. I would also be working on predicting customer churn using Machine Learning for the client. Future Machine Learning model will be put to production and deployed via web app.

Accomplishments

  • Applied strategies for dealing with missing values.
  • Converted data from one type to another.
  • Identified duplicate data and processed it in several different ways.
  • Categorized data.
  • Export final data into pivot tables.
  • Queried and used pivot table for data manipulation and interpretation.
  • Created visualizations using insights from pivot table.
This project was able to answer the business questions. Using insights from this project, the client can classify customer's according to customers’ credit worthiness.

About

Practicum by Yandex Project 1: This project is prepared to analyze a borrower's risk of defaulting in on-time loan repayment by applying various data preprocessing and analytics steps in Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published