- This was my first try at creating a pipeline using an amazon s3 bucket. This script downloads data from an s3 bucket transform it and loads it back into the bucket.
- Here I explore the pyspark functions to complete a basic etl that reads data from an amazon s3 bucket to perform some text transformations
- I explore creating a full end to end pipeline that extracts data from a postgres db transforms it and loads the data back into a postgres db using pyspark
- This script scrapes a apiless eccomerce site for data and stores the result to a csv file