LendingClubML

Claat: https://codelabs-preview.appspot.com/?file_id=17iO3pKXC5vyRMQFh5PLCRtJOvvmEDHGLeW02zzXlzBw#0

Task 1:

-Irs the professor is going to need the whole data set for the study. The professor would like to know the trend and the factors affecting the interest rate of the loan as the students have more data(features) to explore. This strengthens the point that in the industry most of the efforts goes in improving and understanding data. It is more important to have more data when compared to complex algorithms.

-The data had many colors which were empty or had values which can not be used.

-Analysis of the graphs generated from the lending club data. The graphs have been studied keeping in mind the need of the professor.

https://bit.ly/2YhEYQF

Task 2:

The data was cleaned using panda library. The columns with more than 80% of empty data were dropped. The columns with functions not related to prediction were dropped too. Converting the grade from string to numeric value. Changing the type of data and filling the empty cells with the median of the columns.

Used manual feature engineering and auto feature tools (https://github.com/featuretools/featuretools/ )

In manual feature tools, we could change the things according to our needs while in auto we are restricted

Task 3:

Building 3 manual ML models:

-Linear regression

-Random Forest

-Neural Network

We have taken a MAPE of the output to choose the most reliable model. Random Forest has the least MAPE.

Task 4: Hypertuning

Using Hypertuning to increase the efficiency.

a. Regression: Try L1, L2, Elasticnet regularization

b. Neural networks: Change epochs, optimizers, the learning rate

c. Random forest: No of trees, Tree depth

. https://scikitlearn.org/stable/modules/grid_search.html

Libraries Used:

MLPRegressor
numpy
pandas
tensorflow
StandardScaler
Building 3 AutoML models:
TPOT
AutoSklearn
H2O.ai
LinearRegression
RidgeCV
LassoCV
ElasticNet
RandomizedSearchCV
KFold
learning_curve, GridSearchCV
train_test_split

Using AutoML makes it more Interactive and Interpretable. AutoML makes it easy to change the input files without many changes.

It can be used on any input data to get the desired output

Task 5:

Building test cases to check our Random Forest model and Analysing the output.

Citations:

James Max Kanter, Kalyan Veeramachaneni. Deep feature synthesis: Towards automating data science endeavors. IEEE DSAA 2015. [https://dai.lids.mit.edu/wp-content/uploads/2017/10/DSAA_DSM_2015.pdf]
https://cmdlinetips.com/2018/01/7-tips-to-read-a-csv-file-as-pandas-data-frame/
https://github.com/ryanschaub/Predicting-Loan-Interest-Rates/blob/master/WF%20Interest%20Rate%20Predictions.ipynb
Team 4 (Presentation given in the class)
https://www.liebertpub.com/doi/full/10.1089/big.2018.0092

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
code		code
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

README.md

README.md

Repository files navigation

LendingClubML

About

Releases

Packages

Languages

Vaibhavi-Khamar/Assignment3-LendingClub

Folders and files

Latest commit

History

Repository files navigation

LendingClubML

About

Topics

Resources

Stars

Watchers

Forks

Languages