Please find full final report in the file Fraud Detection based on Synthetic Financial Datasets.pdf
Used Kaggle 6 million row synthetic data, found at https://www.kaggle.com/ntnu-testimon/paysim1
Full research paper on data generation can be found here: https://www.researchgate.net/publication/313138956_PAYSIM_A_FINANCIAL_MOBILE_MONEY_SIMULATOR_FOR_FRAUD_DETECTION
Used AWS SageMaker and PySpark to run Logistic Regression, Neural Networks, and XGBoost models
Final Logistic regression model had 0.9993 AUC and 0.9963 F1 Score
HTML of code used also available.