Skip to content

A research on how macroeconomic, microeconomic factors and personal data could affect mortgage risk using Machine Learning techniques.

License

Notifications You must be signed in to change notification settings

HerambVD/mortgage-risk-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Mortgage Risk Analysis

Technologies used: python, pandas, sklearn, jupyter.

This project ‘Mortgage Risk Analysis’ consisted thorough analysis of the dataset and prediction of mortgage defaults. One-Dimensional analysis of the dataset conveyed the missing values in the dataset which were replaced by the mean of respective feature, and the prime result of the one dimensional analysis is that the dataset is skew (Biased). Two-Dimensional analysis showed the correlation between various features in the dataset. It is found that the feature interest rate is related with default value (i.e. more the interest rate the more are the chances that borrower being defaulter). Macroeconomic factors such as GDP rate, HPI do not significantly affect the mortgage risk. Small factors such as FICO score, LTV ratio, maturity time has considerable effect on the mortgage risk. This verifies that dataset consist records of common residential borrowers, since commercial mortgages are not significantly affected by macroeconomic factors.

The results found by the analysis of the dataset are used for the pre-processing. In pre-processing, insignificant features shown by the analysis are removed using technique such as backward elimination and forward selection. The features which are highly correlated are merged using technique known as PCA. The skewness of dataset is the most crucial property of this dataset. This skewness is removed by using concept of upsampling which helps to create dummy records of labels which are less dominant.

The ultimate part is training a model which will be most optimal in prediction of mortgage defaults. The models are trained using classifiers and it is found that the LGBMClassifier is the most optimal classifier of all the other classifiers used to train the model along with the data. The accuracy of this LGBMClassfier 81.77%.

Note: To avail the dataset vist site.

Releases

No releases published

Packages

No packages published