Skip to content

xiangivyli/data-science-portfolio

Repository files navigation

Data Science Projects Portfolio

This portfolio holds the below projects.

Each project has an individual folder for data, codes, and key takeaways. Some projects may cover multiple technical aspects, like data dashboarding containing the data engineering process.

Table of Contents

Part A Data Engineering

Tools:

  • Python with Jupyter Notebook
  • Data Transformation: dbt
  • Data Loading: Airflow (Astro Cli)
  • Data Visualisation: Power BI
  • Data Quality Testing: Soda
  • Data Lake: Google Cloud Storage
  • Data Warehouse: BigQuery
  • Data Orchestration: Airflow

Objectives:

  • extract raw data from Kaggle, process data for a read-to-use dataset
  • reduce file size and identify schema by using parquet files
  • achieve automation and monitorization with Airflow and dbt
  • visualize data for insights with Power BI

Tool: MySQL

Objectives:

  • identify how diseases begin and progress
  • integration of genetics and healthcare data
  • research-ready, well-curated and well-documented data

Tool: SQL Server

Objectives:

  • Split a table into a fact table and dimension tables
  • Set datatype, primary key, foreign key and referential integrity

Part B Exploratory Data Analysis and Data Modelling

Tool: Python

Objectives:

  • identify Pfizer company's position in the pharmaceutical industry
  • visualise the development of Pfizer from 2016 to 2018
  • linear regression between ESG score and total assets

Tool: Python and Power BI

Objectives:

  • build a logistic regression model
  • identify which feature will influence customer churn

Tool: Google Analytics

Objectives:

  • map the persona of customers
  • identify the performance of products
  • identify the pattern of activity
  • the funnel diagrams shows the buyer's journey

Part C Data Visualization and Dashboarding

Tool: Power BI

Objectives:

  • map the persona of customers
  • analysis the features of customers based on the loan status variable

Tool: Python and Power BI

Objectives:

  • Prepare a cleansed dataset for analysis
  • A logical story to explain why the mix and weighting of assessment types changed the final result

Tool: Tableau

Objectives:

  • Provide users a platform to retrieve information about GDP, Life Satisfaction, and Education Level for countries in different year
  • Give a general idea about this information for regions
  • Check the relationship between education level and GDP per capita