Skip to content

sagar118/DataTalksClub-Data-Engineering

Repository files navigation

DataTalksClub-Data-Engineering

This repo contains homework and code for the Data Engineering Zoomcamp by Datatalks.Club.

During the course, we will replicate the following architecture

architecture

Week 1: Introduction

Week 1 covers the following topics:

  • Course overview
  • Introduction to GCP
  • Docker and docker-compose
  • Running Postgres locally with Docker
  • Setting up infrastructure on GCP with Terraform
  • Preparing the environment for the course
  • Homework

Week 2: Workflow Orchestration

Week 2 covers the following topics:

  • Introduction to Prefect
  • ETL with GCP & Prefect
  • From Google Cloud Storage to Big Query
  • Parametrizing Flow & Deployments
  • Schedules & Docker Storage with Infrastructure
  • Prefect Cloud and Additional Resources

Week 3: Data Warehouse

  • Data Warehouse
  • BigQuery
  • Partitioning and Clustering
  • BigQuery Best Practices
  • Internals of BigQuery
  • BigQuery for Machine Learning

Week 4: Analytics Engineering

Tech Stack (up-until week 4):

  • Docker
  • Google Cloud Platform (GCP): Google Cloud Storage and Google BigQuery
  • Postgres
  • Terraform
  • Prefect
  • DBT