Skip to content

waqarg2001/Earthquake-ETL-Pipeline-DE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Leveraging Azure Cloud Services, a daily data pipeline is in action, fetching and transforming earthquake data with a magnitude of 4.5 or higher. The pipeline taps into the United States Geological Survey (USGS) as its source, delivering real-time insights into earthquake activity over the past 7 days.

built-with-love powered-by-coffee cc-nc-sa

OverviewToolsArchitectureDashboardSupportLicense

Overview

Earthquake Data Analysis is an extensive project leveraging the capabilities of Azure services and GitHub to streamline the collection, processing, and analysis of critical earthquake data. Drawing inspiration from similar projects, this endeavor seamlessly acquires data from the United States Geological Survey (USGS) and applies Azure Data Factory for efficient ETL (Extract, Transform, Load) operations. The data is then deposited into Azure Data Lake Storage Gen2 for centralized storage and further undergoes transformations and exploratory analysis using Azure Databricks. Azure Key Vault plays a pivotal role in ensuring the security of sensitive credentials and secrets. Processed data is stored in an Azure SQL Database for optimized querying, with Azure Data Lake Gen2 serving as an intermediary and repository for refined datasets. Ultimately, the project utilizes Tableau to create dashboards that offer valuable insights into recent seismic activities.

The repository directory structure is as follows:

├── Dashboards            <- Includes tableau dashboards. 
|   ├── historical_dashboard.twb      <- historical earthquakes dashboard
│   │
│   ├── usgs_dashboard.twb            <- past 7 days earthquakes dashboard
│
|
├── Resources             <- Contains resources for this readme file.
│
│  
├── databricks notebooks             <- Scripts to aggregate and transform data
|   ├── Configurations           <- configurations used for mounting ADLS and for key vault.
│   │ 
│   ├── Transformations          <- transformation notebooks 
│   
|         
├── dataset         <- Includes datasets created in ADF.
│   
├── linkedService    <- Includes linkedServices created in ADF.
│
├── pipeline    <- Holds pipelines created in ADF.
│
├── trigger    <- Holds scheduled trigger(daily) created in ADF
│
├── README.md    <-The top-level README for developers using this project

Tools

To build this project, the following tools were used:

  • Azure Databricks
  • Azure KeyVault
  • Azure Active Directory
  • Azure DataLake Gen 2
  • Azure Blob Storage
  • Azure Data Factory
  • Azure SQL Database
  • Azure Monitor
  • Azure Cost & Billing
  • Tableau
  • Pyspark
  • SQL
  • Git

Architecture

Following is the architecture of the project.

Dashboard

These are the dashboards made in Tableau.

Tableau Public Links:

USGS Earthquake Dashboard

Historical Earthquake Dashboard

Support

If you have any doubts, queries, or suggestions then, please connect with me on any of the following platforms:

Linkedin Badge Gmail Badge

License

by-nc-sa

This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. If you remix, adapt, or build upon the material, you must license the modified material under identical terms.