Skip to content
/ ETL Public

Quickstart ETL in PySpark for large scale data

Notifications You must be signed in to change notification settings

mbsuraj/ETL

Repository files navigation

ETL

General Info

This is a work in progress. The aim of this project is to create a data pipeline that transforms and loads very large data sets into databases

Technology

  1. Python.
  2. bash script.

Setup

Coming soon :)

Status

Still putting things in the right place. Currently using demo data_lake from a case study.

Releases

No releases published

Packages

No packages published

Languages