This repository details my exploration into the data engineering process.

Basic ETL

This was my first try at creating a pipeline using an amazon s3 bucket. This script downloads data from an s3 bucket transform it and loads it back into the bucket.

Basic Batch Processing

Here I explore the pyspark functions to complete a basic etl that reads data from an amazon s3 bucket to perform some text transformations

End to End ETL pipeline

I explore creating a full end to end pipeline that extracts data from a postgres db transforms it and loads the data back into a postgres db using pyspark

Scraping without an API

This script scrapes a apiless eccomerce site for data and stores the result to a csv file

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
basic_batch_processing		basic_batch_processing
basic_end_to_end_etl		basic_end_to_end_etl
basic_etl		basic_etl
basic_web_scraping		basic_web_scraping
data		data
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

basic_batch_processing

basic_batch_processing

basic_end_to_end_etl

basic_end_to_end_etl

basic_etl

basic_etl

basic_web_scraping

basic_web_scraping

data

data

.gitignore

.gitignore

README.md

README.md

Repository files navigation

This repository details my exploration into the data engineering process.

Basic ETL

Basic Batch Processing

End to End ETL pipeline

Scraping without an API

About

Releases

Packages

Contributors 2

Languages

cliffordEmmanuel/Exploring_Data_Engineering

Folders and files

Latest commit

History

Repository files navigation

This repository details my exploration into the data engineering process.

About

Resources

Stars

Watchers

Forks

Languages