Credit Card Fraud Detection using Logistic Regression

Requirements:

pandas
numpy
sklearn
matplotlib
seaborn

In this project we try to detect credit card fraud using Logistic Regression also we preprocessing the data.

Database used is Credit Card Fraud Detection from Kaggle

Data Visualzation

We start by loading the data into the jupyter notebook. After loading the data, we convert the data into a data frame using the pandas to make it more easier to handel. After loading the data, we visualize the data. First we need to know how our data looks so we use dataframe.head() to visualize the first 5 rows of the data also we need to know how our data is distributed so we plot our data.

Fig 1: Frauds hppened with respect to the time frame and their respective amounts.

Correlation of features

Using dataframe.corr(), we find the Pearson, Standard Correlation Coefficient matrix.

Fig 2: Correlation of the futures

Data Selection

Since the data is highly Unbalanced We need to undersample the data.

Why are we undersampling instead of oversampling?

We are undersampling the data because our data is highly unbalanced. The number of transactions which are not fradulent are labeled as 0 and the trancactions whoch are fradulent are labeled as 1.

The number of non fraudulent transactions are 284315 and the number of fradulent transactions are 492.

If we oversample our data so inclusion of almost 284000 dummy elements will surely affect our outcome by a huge margin and it will be hugely biased as non-fradulant so undersampling is a much better approach to get an optimal and desired outcome.

Confusion Matrix

We create a user defined function for the confusion matrix or we can use confusion_matrix from sklearn.matrics library.

Applying Logistic Regression

We train our model using LogisticRegression from sklearn.linear_model. The syntax is as follows:

from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
pred = classifier.predict(X_train)
print(classifier.score(X_train,y_train))

We get accuracy of our training model more than 95% most of the time with random samples. The confusion matrix is as follows:

Fig 3: Confusion matrix of training model

Precision, Recall, F1-Score, Mean Absolute Error, Mean Percentage Error and Mean Squared Error

We find the Precision, Recall, F1-Score, Mean Absolute Error, Mean Percentage Error and Mean Squared Error using the following syntax -

from sklearn.metrics import classification_report,mean_absolute_error,mean_squared_error,r2_score
report= classification_report(y_train,pred)
print(report)
mean_abs_error = mean_absolute_error(y_train,pred)
mean_abs_percentage_error = np.mean(np.abs((y_train - pred) // y_train))
mse= mean_squared_error(y_train,pred)
r_squared_error = r2_score(y_train,pred)
print("Mean absolute error : {} \nMean Absolute Percentage error : {}\nMean Squared Error : {}\nR Squared Error  {}".format(mean_abs_error,mean_abs_percentage_error,mse,r_squared_error))

Undersampling and Synthetic Minority Oversampling Technique (SMOTE) approach

To improve our performance, we use combination of undersampling and SMOTE on our dataset. Syntax is as follows:

from imblearn.over_sampling import SMOTE
oversample=SMOTE()
X_train,y_train= oversample.fit_resample(X_train,y_train)

Applying Logistic regression on training model with Undersampling and SMOTE.

We apply logistic regression on our dataset as usual. After applying logistic regression in most of the cases we observe that in most of the cases our accuracy is improved. Confusion matrix is as follows -

Fig 4: Confusion matrix after Undersampling and SMOTE

Hyperparameter Tuning

To improve our accuracy further, we tune the hyper parameter. Syntax is as follows -

classifier_b = LogisticRegression(class_weight={0:0.6,1:0.4})
classifier_b.fit(X_train,y_train)
pred_b = classifier_b.predict(X_test_all)
print(classifier_b.score(X_test_all,y_test_all))

The confusion matrix of the Testing model is as follows:

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
LICENSE		LICENSE
README.md		README.md
logistic_regression_for_credit_card_fraud_detection.ipynb		logistic_regression_for_credit_card_fraud_detection.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

logistic_regression_for_credit_card_fraud_detection.ipynb

logistic_regression_for_credit_card_fraud_detection.ipynb

Repository files navigation

Credit Card Fraud Detection using Logistic Regression

Requirements:

Data Visualzation

Fig 1: Frauds hppened with respect to the time frame and their respective amounts.

Correlation of features

Fig 2: Correlation of the futures

Data Selection

Confusion Matrix

Applying Logistic Regression

Fig 3: Confusion matrix of training model

Precision, Recall, F1-Score, Mean Absolute Error, Mean Percentage Error and Mean Squared Error

Undersampling and Synthetic Minority Oversampling Technique (SMOTE) approach

Applying Logistic regression on training model with Undersampling and SMOTE.

Fig 4: Confusion matrix after Undersampling and SMOTE

Hyperparameter Tuning

Fig 5: Confusion Matrix after Hyperparameter tuning on the testing model

About

Releases

Packages

Languages

License

prabhatk579/credit-card-fraud-detection-using-logistic-regression

Folders and files

Latest commit

History

Repository files navigation

Credit Card Fraud Detection using Logistic Regression

Requirements:

Data Visualzation

Fig 1: Frauds hppened with respect to the time frame and their respective amounts.

Correlation of features

Fig 2: Correlation of the futures

Data Selection

Confusion Matrix

Applying Logistic Regression

Fig 3: Confusion matrix of training model

Precision, Recall, F1-Score, Mean Absolute Error, Mean Percentage Error and Mean Squared Error

Undersampling and Synthetic Minority Oversampling Technique (SMOTE) approach

Applying Logistic regression on training model with Undersampling and SMOTE.

Fig 4: Confusion matrix after Undersampling and SMOTE

Hyperparameter Tuning

Fig 5: Confusion Matrix after Hyperparameter tuning on the testing model

About

Topics

Resources

License

Stars

Watchers

Forks

Languages