Fraud Detection using Logistic Regression and Random Forest

This script demonstrates how to build and evaluate two different machine learning models to detect fraudulent activity in a dataset. Specifically, the script uses logistic regression and random forest classification models to predict whether a given transaction is fraudulent or not, based on a set of features provided in the input data.

The script uses the popular pandas and numpy libraries for data manipulation, and scikit-learn for model building and evaluation.

Prerequisites

Python 3.x
pandas, numpy, and scikit-learn libraries

Usage

Download the data.csv file and place it in the same directory as the script.
Open a terminal or command prompt, navigate to the directory where the script is located.
Run the script using the command python script.py.

Description

Load the data from the data.csv file using pandas.
Randomly sample 10% of the data for faster processing.
Separate the target variable and features.
Split the data into training and testing sets.
Build and train a logistic regression model on the training data.
Generate predictions using the logistic regression model on the testing data and evaluate the performance using the area under the receiver operating characteristic curve (ROC AUC).
Build and train a random forest classification model on the training data.
Generate predictions using the random forest model on the testing data and evaluate the performance using the ROC AUC.
Perform feature selection using the random forest model and build a new logistic regression model with the selected features.
Generate predictions using the new logistic regression model on the testing data with selected features and evaluate the performance using the ROC AUC.
Print the ROC AUC scores for all three models, and the list of selected features.

Conclusion

This script demonstrates how to build and evaluate two different machine learning models for fraud detection. The logistic regression model achieves reasonable performance on its own, but by performing feature selection with a random forest model, the performance can be further improved.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
.gitignore		.gitignore
FraudDetect.ipynb		FraudDetect.ipynb
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

FraudDetect.ipynb

FraudDetect.ipynb

readme.md

readme.md

Repository files navigation

Fraud Detection using Logistic Regression and Random Forest

Prerequisites

Usage

Description

Conclusion

About

Releases

Packages

Languages

BlakSwan/Fraud-Detection

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection using Logistic Regression and Random Forest

Prerequisites

Usage

Description

Conclusion

About

Resources

Stars

Watchers

Forks

Languages