Skip to content

trying to deploy a Machine Learning model and the API code can be accessed across from any machine.

Notifications You must be signed in to change notification settings

brdx88/trial_deploy_model

Repository files navigation

CREDIT RISK MODEL

Live Credit Risk Predictor

1

Business Problems

As a Financing Company, the user wants to build a credit scoring model to predict whether the client will default or not after their loan application.

Business Goals

Research and develop the model to predict applicants whether the applicant will default or not, and also find the best metrics since this is an imbalance class dataset.

Data Dictionary

Feature Name Description
person_age Age
person_income Annual income
person_home_ownership Home ownership
person_emp_length Employment length (in years)
loan_intent Loan intent
loan_amnt Loan amount
loan_int_rate Interest rate
loan_percent_income Percent income by loan
cb_person_default_on_file Historical default
cb_person_cred_hist_length Credit history length
loan_status Loan status

Results

    +----------------------+--------------------------------+--------------------------------+--------------------------------+
    |        		   | Train	    	   	    | Test	    	   	     | Holdout Sample	   	      | 
    | Model                +----------+----------+----------+----------+----------+----------+----------+----------+----------+
    |   		   | Recall   | F1-Score | AUC	    | Recall   | F1-Score | AUC	     | Recall   | F1-Score | AUC      |
    +----------------------+----------+----------+----------+----------+----------+----------+----------+----------+----------+	
    | Logistic Regression  | 0.525296 | 0.618740 | 0.737900 | 0.524548 | 0.621112 | 0.738705 | 0.470407 | 0.584248 | 0.717754 |
    | RandomForest	   | 0.000000 | 0.000000 | 0.500000 | 0.000000 | 0.000000 | 0.500000 | 0.000000 | 0.000000 | 0.500000 |
    | XGBoost  		   | 0.695587 | 0.813239 | 0.845633 | 0.689061 | 0.798403 | 0.839225 | 0.696387 | 0.802480 | 0.843304 |
    +----------------------+----------+----------+----------+----------+----------+----------+----------+----------+----------+

Conclusions

Since this case is an imbalanced dataset (non-default:77.7% ; default:22.3%), it's worth looking at the AUC and Recall metrics instead. Why? Especially for Recall metrics. For business purposes, we assume to minimize Type 2 (minimize False Negative -- predict non-default (0), actual default (1)). Hence, we use Recall metrics for optimum result.

It can be seen in the table above, the model which has the highest and the most stable AUC and Recall is XGBoost AUC: 0.839225 and XGBoost Recall: 0.689061.

In addition, the Recall and AUC scores on the train and test are not much different. It means that we can conclude that this model is 'just right' to classify target 1 and target 0, neither overfitting nor underfitting.

If we look back at the features importance by Logistic Regression with Lasso regularization, the selected features seem make sense. Features which affect loan_status are:

  1. Percentage of Income ('loan_percent_income'),
  2. Loan Amount ('loan_amnt_WOE'),
  3. Employement Length ('person_emp_length_WOE'),
  4. Owning Home ('person_home_ownership_OWN'),
  5. Loan Grade ('loan_grade'),
  6. Intention for Venture ('loan_intent_VENTURE'),
  7. Intention for Education ('loan_intent_EDUCATION'),
  8. Renting home ('person_home_ownership_RENT'),
  9. Age ('person_age_WOE'),
  10. Credit History Length ('cb_person_cred_hist_length_WOE'),
  11. Intention for personal purposes ('loan_intent_PERSONAL'),
  12. Intention for home improvement ('loan_intent_HOMEIMPROVEMENT').

After tuning the models and get each metrics, we could predict the holdout sample using our previous models. We see that the XGBoost algorithm shows its best performance among the others. In the holdout sample, XGBoost can reach the AUC: 0.843304 and the Recall: 0.696387. It tells us that XGBoost could be our model for production, because it's not overfitted and it can predicts the holdout sample very well.

GUIDANCE FOR INPUT AND OUTPUT FORMAT

Guidance for input and output format when access it on web.

1. Input Format

Endpoint: https://credit-risk-brianic.herokuapp.com/predict-api.

Using 'POST' method, input variables like the data dictionary above except loan status. It must be JSON like this below for example:

{
	"person_age":24,
	"person_income":168000,
	"person_home_ownership":"MORTGAGE",
	"person_emp_length":0.0,
	"loan_intent":"PERSONAL",
	"loan_grade":"E",
	"loan_amnt":25000,
	"loan_int_rate":16.45,
	"loan_percent_income":0.15,
	"cb_person_default_on_file":"N",
	"cb_person_cred_hist_length":3
	}

2. Output Format

The expected output should be like this below:

{
    "model": "XGB-Credit-Risk",
    "prediction": "87.76% Non-default",
    "version": "1.0.0"
}

About

trying to deploy a Machine Learning model and the API code can be accessed across from any machine.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages