Input data

#Develop a pipeline using Hadoop ecosystem to detect credit card fraud transactions

Input data

card_member - card-id, member_id, member_joining_dt, card_purchase_dt, country, city
member_score - member_id, score
card_transactions - card-id, member_id, amount, postcode, pos_id, transaction_dt, status
Transactional payload(data) - Sent from pos terminals to kafka topic - Real time streaming data card-id, member_id, amount, pos_id, postcode, transaction_dt

Architecture and approach

Ingest data from aws RDS using sqoop - card_member and member_score tables a. Sqoop commands to ingest to hadoop - done b. Incremental ingest - using jobs may be
Load card_transactions table into NoSql db i.e., Hbase a. Copy card_transactions from local to hdfs b. Create hive table and load data c. Create hive-hbase integrated table using rowkey defined d. Create a lookup table in hbase - Use a NoSQL database which gives schema evolution, schema versioning, row-level lookups (efficient reads), and tunable consistency.
Calculate three parameters a. Upper Control Limit (UCL) - Moving average + 3*Standard Deviation Moving average and standard deviation is calculated based on last 10 amount credited with Genuine ->card_transactions - for last 10 transactions ->At regular intervals- lookup table needs to be updated with ucl values for each card-id
```
 Steps
 Create raw table in hive containing card_transactions data
 Calculate moving average and standard deviation in hive and store in next staging table
 Calculate ucl value in hive and store in next final table
 Then store ucl value in hbase lookup table
```
b. Credit score - If score < 200 , transaction is fraudulent Need to be updated every 4 hours

c. zip code distance - distance of current and last transaction wrt time Need to use post code library provided Store post_code and transaction_dt of last transaction in lookup table
Use SparkStreaming Connect to kakfa topic to get the streaming data i.e., transactional payload (data)
Query lookup table to check the rules to find transaction is fraudelent or not in seconds SLA a. Retrieve postcode and transaction_dt from lookup and compare with current postcode and transaction_dt. Use postcode library provided to calculate the speed. If its ahead of imaginable speed it is fraud. b. Writes record with status whether Genuine or Fraudulent to card_transactions table of hive/hbase c. Update zip code and date time in lookup table only if current transaction if genuine

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ccfrauddetectionsystem		ccfrauddetectionsystem
BatchLayerProcessing.txt		BatchLayerProcessing.txt
README.md		README.md
StreamingLayerProcessingAndFinalLogic.txt		StreamingLayerProcessingAndFinalLogic.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ccfrauddetectionsystem

ccfrauddetectionsystem

BatchLayerProcessing.txt

BatchLayerProcessing.txt

README.md

README.md

StreamingLayerProcessingAndFinalLogic.txt

StreamingLayerProcessingAndFinalLogic.txt

Repository files navigation

Input data

Architecture and approach

About

Releases

Packages

Languages

vinaymadupathi/CCFraudDetectionSystem

Folders and files

Latest commit

History

Repository files navigation

Input data

Architecture and approach

About

Topics

Resources

Stars

Watchers

Forks

Languages