Skip to content

This repository contains OLTP, ETL process (using Pentaho Data Integration), and OLAP of credit card dataset. The dataset is taken from Kaggle (https://www.kaggle.com/rikdifos/credit-card-approval-prediction) and part of author Capstone Project.

Notifications You must be signed in to change notification settings

caesarmario/data-warehouse-credit-card-applicant-using-pentaho

Repository files navigation

💳 Data Warehouse Credit Card Applicant 💳

using Pentaho Data Integration (PDI)/Kettle and Microsoft SQL Server 18 ⚙

.: 📄 Dataset taken from Kaggle :.


Star Badge


📃 Table of Contents:


🖋 About Project

  • This repository contains files to create data warehouse such as:

    • ETL files using Pentaho Data Integration (PDI)
    • Codes to create OLAP (SQL)
    • Codes to select data from OLTP (SQL)
    • Codes to perform random testing (SQL)

    for credit card applicant. The dataset is provided by Seanny (rikdifos).

  • This project will also create:

    • 2 dimension tables (Applicant_Dimension and CreditRecord_Dimension),
    • Time dimension (Time_Dimension), and
    • 1 fact table (CreditCard_Fact).

    using PDI and Microsoft SQL Server 18.

📌 Objectives

  • Perform ETL using PDI for both datasets.
  • Create time dimension using PDI.
  • Create fact table using PDI.

🧾 Data Set Description

  • The dataset description can be seen here.

🔌 Connection Configuration

username: 	sa
pass: 		qwer

📀🔌 OLTP Configuration

OLTP Config

💿🔌 OLAP Configuration

OLAP Config



⚙ ETL Process

👨‍💼 Application Record

Application

▶ Table Input Configuration

Table Input - Application

  • Importing application table from OLTP.

▶ Sort Rows Configuration

Sort Rows - Application

  • Sort data based on applicant ID.

▶ Unique Rows Configuration

Unique Rows - Application

  • Filter duplicate applicant ID.

▶ Replace in String Configuration

Replace in String - Application

  • Replace some values to make it easier to understand.

▶ Add Constants Configuration

Add Constants - Application

  • Add new columns with constant date (October 1, 2021).

▶ Calculator Configuration

Calculator - Application

  • Calculate DOB and date of applicant start working based on current date (October 1, 2021).
  • Calculate age of applicant based on current year (2021).

▶ Filter Rows Configuration

Filter Rows - Application

  • Filter applicant data which has null values.
  • Filter applicant data who is less than 21 y.o.

▶ Add Sequence Configuration

Add Sequence - Application

  • Adding Index Applicant (to replace ID as primary key).

▶ Select Values Configuration

Select Values - Application

  • Select columns that will entered OLAP.

▶ Table Output Configuration

Table Output - Application

  • Exporting application table to OLAP (Application Dimension).



💶 Credit Record

Credit Record

▶ Table Input Configuration

Table Input - Credit Record

  • Importing credit record table from OLTP.

▶ Sort Rows Configuration

Sort Rows - Credit Record

  • Sort data based on applicant ID.

▶ Add Constants Configuration

Add Constants - Credit Record

  • Add new columns with constant date (October 1, 2021).

▶ Calculator Configuration

Calculator - Credit Record

  • Calculate loan payment's month based on current date (October 1, 2021).

▶ Add Sequence Configuration

Add Sequence - Credit Record

  • Adding CreditRecord_ID (to replace Applicant ID as primary key).

▶ Select Values Configuration

Select Values - Credit Record

  • Select columns that will entered OLAP.

▶ Table Output Configuration

Table Output - Credit Record

  • Exporting application table to OLAP (Credit Record Dimension).



⌚ Time Dimension

Time

▶ Generate Rows Configuration

Generate Rows - Time

  • Generate a column with specific date (January 1, 2016).

▶ Add Sequence Configuration

Add Sequence - Time

  • Add row with sequence from 1 to 99999.

▶ Calculator Configuration

Calculator - Time

  • Caluclating start date with sequence data to make next date (ex: January 2, 2016; January 3, 2016)
  • Creating new columns (Day, Months, and Year).

▶ Data Grid Configuration

Data Grid - Time_1
Data Grid - Time_2

  • Creating month number and month name.

▶ Stream Lookup Configuration

Stream Lookup - Time

  • Combine 'Month' from Calculator node to 'No_Month' from Data Grid node.

▶ Modified JavaScript Value Configuration

Modified JavaScript - Time

  • Creating time ID using JavaScript code.

▶ Select Values Configuration

Select Values - Time

  • Select columns that will entered OLAP.

▶ Table Output Configuration

Table Output - Time

  • Exporting time dimension to OLAP.



💳 Credit Card Fact

Credit Fact

▶ Table Input (Credit Record) Configuration

Table Input CR - Fact

  • Importing Credit Record dimension from OLAP.

▶ Table Input (Application) Configuration

Table Input Application - Fact

  • Importing Application dimension from OLAP.

▶ Stream Lookup 1 Configuration

Stream Lookup 1 - Fact

  • Join both dimension tables based on applicant ID.

▶ Filter Rows Configuration

Filter Rows - Fact

  • Filter applicant ID that doesn't exists in both tables.

▶ Table Input (Time) Configuration

Table Input Time - Fact

  • Importing Time dimension from OLAP.

▶ Stream Lookup 2 Configuration

Stream Lookup 2 - Fact

  • Join application & credit record dimension with time dimension.

▶ Replace in String 1 Configuration

Replace in String 1 - Fact

  • Replace C, X, 0 with 'Good Debt' (C: loan for that month is already paid; X: no loan for that month; 0: loan is 1 to 29 days overdue).
  • Replace 1, 2, 3, 4, 5 with 'Bad Debt' (1: loan is 30 to 59 days overdue; 2: loan is 60 to 89 days overdue; 3: loan is 90 to 119 days overdue; 4: loan is 120 to 149 days overdue; 5: loan is more than 150 days overdue)

▶ Calculator Configuration

Calculator - Fact

  • Creating 2 copies from 'Status' column ('Good_Debt' and 'Bad_Debt').

▶ Replace in String Configuration

Replace in String - Fact

  • Good_Debt: Good Debt will be change to 1, while Bad Debt will be change to 0
  • Bad_Debt: Good Debt will be change to 0, while Bad Debt will be change to 1

▶ Get System Info Configuration

Get System Info - Fact

  • To create date & time when ETL was performed.

▶ Select Values Configuration

Select Values - Fact

  • Select columns that will entered OLAP.

▶ Table Output Configuration

Table Output - Fact

  • Exporting fact table to OLAP.



⭐ Star Schema

  • Star schema generated using Power BI Star Schema



👀 Before & After ETL Comparison

  • This section will show the data structure before & after ETL.

👨‍💼 Application Record

Applicant_1
Applicant_2

💶 Credit Record

Credit Record

⌚ Time Dimension

Time Dimension

💳 Credit Card Fact

Credit Card Fact



🙌 Support me!

👉 If you find this project useful, please ⭐ this repository 😆!


👉 More about myself: here

About

This repository contains OLTP, ETL process (using Pentaho Data Integration), and OLAP of credit card dataset. The dataset is taken from Kaggle (https://www.kaggle.com/rikdifos/credit-card-approval-prediction) and part of author Capstone Project.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published