IEEE-CIS_Fraud_Detection

Predicting the probability that an online transaction is fraudulent, as denoted by the binary target isFraud.
The dataset is from Kaggle, containing 59,0540 rows and 433 features.

Validation Strategy

The timespan of the total data set is 365 days, where that of the training set is 182 days and that of the test set is 183 days. Thus, the validation strategy used in this project is time-based validation, training for the first 5 months and predicting the last month.

EDA

Data Processing

Dimension Reduction

Apply PCA to highly correlated and redundant V1-V339 features

Feature Selection

Perform Adversarial Validation to find features that are important in differentiating cards as the training set and test set have different sets of cards

Feature Engineering

Combining features to generate new features
Feature A and B by themselves may not correlate with the target variable but FeatureA+B may correlate with the target variable.
Frequency Encoding for categorical features
Replace categorical values with corresponding frequency
Group statistics
For example, group by card1, get mean or std of TransactionAmt for each group. This can let the model know whether a row has abnormal TransactionAmt for their group.

Process TimeDelta Features

Normalize D columns to prevent them from increasing by time
Convert TransactionDT into datetime by providing a reference datetime

Mapping Emails to keep only Email Domain

Model Training

Parameter Tuning with Hyperopt
XGBoost (ROC_AUC: 0.9280)
CatBoost (ROC_AUC: 0.9146)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
EDA.ipynb		EDA.ipynb
IEEE-CIS_Fraud_Detection.ipynb		IEEE-CIS_Fraud_Detection.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EDA.ipynb

EDA.ipynb

IEEE-CIS_Fraud_Detection.ipynb

IEEE-CIS_Fraud_Detection.ipynb

README.md

README.md

Repository files navigation

IEEE-CIS_Fraud_Detection

Validation Strategy

EDA

Data Processing

Dimension Reduction

Feature Selection

Feature Engineering

Process TimeDelta Features

Mapping Emails to keep only Email Domain

Model Training

About

Releases

Packages

Languages

xianchen2/Online_Transaction_Fraud_Detection

Folders and files

Latest commit

History

Repository files navigation

IEEE-CIS_Fraud_Detection

Validation Strategy

EDA

Data Processing

Dimension Reduction

Feature Selection

Feature Engineering

Process TimeDelta Features

Mapping Emails to keep only Email Domain

Model Training

About

Topics

Resources

Stars

Watchers

Forks

Languages