Skip to content

COVID 19 Data Pipeline - Azure Data Engineering: A complete end-to-end ETL pipeline to fetch COVID19 daily and weekly data from API, transform and load to SQL database in Azure using Data Factory, Databricks, Key-Vaults and Data Lake

License

Avi-k-dua/covid19-dataengineering-azure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

COVID19 Data Pipeline - Azure Data Engineering

Project Requirement

Develop an end-to-end Data Pipeline on Azure Data Factory to create a datawarehouse of COVID19 daily and weekly data related to Testing, Cases, Deaths, Hospital Admissions etc.

Tasks

  • Create a Data Model based on Project Requirement & Available ECDC API endpoints
  • Develop Data Pipeline in Azure Data Factory
    • Fetched data from ECDC API and Azure Blob Storage
    • Processed data by applying diverse transformations as per requirements using
      • Dataflows
      • Pyspark in Azure Databricks
    • Created a Data Lake to store Raw, Processed and Lookup files in separate containers
    • Loaded data into Azure SQL Database
  • Develop a Datawarehouse in Azure SQL DB

Services Used

  • Azure Data Factory (Dataflows, Linked Services, Triggers, Azure Databricks)
  • Azure Blob Storage
  • Azure Lake Storage Gen 2
  • Azure Key-Vault
  • Azure SQL DB

Solution Architecture:

Solution Architecture

Future Work

  • Load data from Azure SQL DB into Power BI Desktop
  • Visualize COVID-19 trends to gain insights in Power BI

About

COVID 19 Data Pipeline - Azure Data Engineering: A complete end-to-end ETL pipeline to fetch COVID19 daily and weekly data from API, transform and load to SQL database in Azure using Data Factory, Databricks, Key-Vaults and Data Lake

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published