Skip to content

Suprame4/Data_Engineering_Projects

Repository files navigation

Description


  • This repo contains projects relating to data engineering concepts
  • Further information and details about certain concepts can be found in the Intro to Basics folder

Projects


  1. Linux and Shell Scripting
  • This project applies my abilities of Linux and shell scripting to complete a fictional scenario as a linux developer at a top-tech company.
  1. Building Data Pipelines with Airflow
  • Apache Airflow is a great open source workflow orchestration tool that lets you build and run workflows
  • This project will collect data available in different formats, and consolidate it into a single file
  1. Building Data Pipelines with Kafka
  • Apache Kafka is a very popular open source event streaming pipeline
  • This project will create a data pipeline that collects streaming data and loads it into a database using Kafka
  1. Building Data Pipelines with Shell
  • Create a shell scripts to extract, transform, and load data
  • Create and populate a PostgreSQL table
  1. Data Warehousing with Postgres
  • Apply my knowledge and skills to design and load data into a data warehouse using facts and dimension tables
  • Write aggregation queries using CUBE and ROLLUP functions and create materialized query tables (materialized view)
  1. NoSQL with MongoDB, Cassandra and IBM Cloudant
  • This project applies my abilities to work with several NoSQL databases to move and analyze data
  • Move data from one type of database to another and run basic queries on various databases
  1. Data Engineering and Machine Learning with Spark
  • Use Apache Spark for Data Engineering and Machine Learning
  • Create a Spark application end-to-end that includes ETL and model training