Skip to content

A data pipeline to ingest, process, store storm events datasets so we can access them through different means.

Notifications You must be signed in to change notification settings

goyal07nidhi/Data-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Pipeline

made-with-python

Quick Links

Table of Contents

Introduction: Three experiments with Big data

In this project, we will develop a data pipeline to ingest, process, store it so you can access it through different means.

Data Explanation

SEVIR: The Storm EVent ImagRy (SEVIR) dataset is a collection of temporally and spatially aligned images containing weather events captured by satellite and radar.

The dataset contains thousands of samples of 4 hour events captured by one or more of these weather sensors. This loop shows one such event:

sevir_sample

Storm Events Database: The database currently contains data from January 1950 to November 2020, as entered by NOAA's National Weather Service (NWS). Data are available on the Registry of Open Data on AWS. Dataset and the Website

More to read:

Setup

  • Python 3.7+
  • Python IDE
  • Code editor
  • Amazon S3 Buckets
  • Amazon Glue
  • Amazon Athena
  • Amazon Quicksight
  • Google storage buckets
  • Google Dataflow
  • Google Bigquery
  • Data studio
  • Snowflake
  • Sql-alchemy
  • Apache Superset

Clone

Clone this repo to your local machine using https://github.com/goyal07nidhi/Data-Pipeline.git

Folder Contents

Refer README.md inside the respective directories for setup instructions.

  • ✅ AWS S3: AWS
  • ✅ GCP - Dataflow, Datalab: GCP
  • ✅ SNOWFLAKE: SNOWFLAKE

Team Members:

  1. Nidhi Goyal
  2. Kanika Damodarsingh Negi
  3. Rishvita Reddy Bhumireddy