In this notebook, we will combine SQL
and python
tools to analyze historical credit card transactions and consumption pattern in order to identify possible fraudulent transactions. We will accomplish the following tasks.
-
Data Modeling: Define a database model to store the credit card transactions data and create a new
PostgreSQL
database using the model. -
Data Engineering: Create a database schema on
PostgreSQL
and populate the database from theCSV
files provided in the Data folder. -
Data Analysis: Analyze the data to identify possible fraudulent transactions trends data, and develop a report.
Download PostgreSQL, "a powerful, open source object-relational database system that uses and extends the SQL language combined with many features that safely store and scale the most complicated data workloads"
Using Conda as the package manager, install the following extensions and libraries from Git Bash
terminal:
pip install SQLAlchemy
pip install python-dotenv
Follow the provided PyViz
installation guide to complete installations.
Inspect the CSV files to create an entity relationship diagram. For example, a card holder id
can have a one-to-many relationship to the credit card
table. Meaning one person can have multiple credit cards.
Use the EDR diagram as a guide to develop a database in postgreSQL
. Using the query tool and CSV files, create tables and import data.
For example:
CREATE TABLE card_holder(
id INT PRIMARY KEY,
name VARCHAR(50)
);
We can also use the all_tables_seed.sql
files located inside the Data folder to insert values after all the tables have been created.
Using the sqlalchemy
module, we can query the local postgreSQL
database from the jupyter notebook. Using pandas
, plotly
and hvplot
, we can build a dataframe to run data analysis and create visulizations of customer spending trends.
This box plot helps us visualize anomalies in the customer's spending habits for the first six months of the year. The outliers in the box plot may be the result of fraudulent transactions.