CCFraudDetectionSystem

#Develop a pipeline using Hadoop ecosystem to detect credit card fraud transactions

Input data card_member - card-id, member_id, member_joining_dt, card_purchase_dt, country, city member_score - member_id, score card_transactions - card-id, member_id, amount, postcode, pos_id, transaction_dt, status Transactional payload(data) - Sent from pos terminals to kafka topic - Real time streaming data card-id, member_id, amount, pos_id, postcode, transaction_dt Architecture and approach Ingest data from aws RDS using sqoop - card_member and member_score tables a. Sqoop commands to ingest to hadoop - done b. Incremental ingest - using jobs may be

Load card_transactions table into NoSql db i.e., Hbase a. Copy card_transactions from local to hdfs b. Create hive table and load data c. Create hive-hbase integrated table using rowkey defined d. Create a lookup table in hbase - Use a NoSQL database which gives schema evolution, schema versioning, row-level lookups (efficient reads), and tunable consistency.

Calculate three parameters a. Upper Control Limit (UCL) - Moving average + 3*Standard Deviation Moving average and standard deviation is calculated based on last 10 amount credited with Genuine ->card_transactions - for last 10 transactions ->At regular intervals- lookup table needs to be updated with ucl values for each card-id

Steps Create raw table in hive containing card_transactions data Calculate moving average and standard deviation in hive and store in next staging table Calculate ucl value in hive and store in next final table Then store ucl value in hbase lookup table b. Credit score - If score < 200 , transaction is fraudulent Need to be updated every 4 hours

c. zip code distance - distance of current and last transaction wrt time Need to use post code library provided Store post_code and transaction_dt of last transaction in lookup table

Use SparkStreaming Connect to kakfa topic to get the streaming data i.e., transactional payload (data)

Query lookup table to check the rules to find transaction is fraudelent or not in seconds SLA a. Retrieve postcode and transaction_dt from lookup and compare with current postcode and transaction_dt. Use postcode library provided to calculate the speed. If its ahead of imaginable speed it is fraud. b. Writes record with status whether Genuine or Fraudulent to card_transactions table of hive/hbase c. Update zip code and date time in lookup table only if current transaction if genuine

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Logic Explanation for Data Loading and Batch processing		Logic Explanation for Data Loading and Batch processing
README.md		README.md
RemovalScript.txt		RemovalScript.txt
Script for Python code.txt		Script for Python code.txt
Scripts_for_CreditCardFraud.txt		Scripts_for_CreditCardFraud.txt
python.zip		python.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logic Explanation for Data Loading and Batch processing

Logic Explanation for Data Loading and Batch processing

README.md

README.md

RemovalScript.txt

RemovalScript.txt

Script for Python code.txt

Script for Python code.txt

Scripts_for_CreditCardFraud.txt

Scripts_for_CreditCardFraud.txt

python.zip

python.zip

Repository files navigation

CCFraudDetectionSystem

About

Releases

Packages

sudhirtelkar1/CCFraudDetectionSystem

Folders and files

Latest commit

History

Repository files navigation

CCFraudDetectionSystem

About

Topics

Resources

Stars

Watchers

Forks