Skip to content

toby-cheng/Fraud-Detection

Repository files navigation

Fraud-Detection-AWS-Spark

Group project on Fraud Detection

Please find full final report in the file Fraud Detection based on Synthetic Financial Datasets.pdf

Used Kaggle 6 million row synthetic data, found at https://www.kaggle.com/ntnu-testimon/paysim1

Full research paper on data generation can be found here: https://www.researchgate.net/publication/313138956_PAYSIM_A_FINANCIAL_MOBILE_MONEY_SIMULATOR_FOR_FRAUD_DETECTION

Used AWS SageMaker and PySpark to run Logistic Regression, Neural Networks, and XGBoost models

Final Logistic regression model had 0.9993 AUC and 0.9963 F1 Score

HTML of code used also available.