3NF-normalize Yelp data on S3 with Spark and load it into Redshift - automate the whole thing with Apache Airflow
-
Updated
Aug 17, 2019 - Jupyter Notebook
3NF-normalize Yelp data on S3 with Spark and load it into Redshift - automate the whole thing with Apache Airflow
Losing customer it’s not an option. Today in the world we have a ton of devices that are gathering and sending data. The benefit of using a document store database #NoSQL, is that developers don’t need to maintain and/or adjust entities, migrations and changes on existing products. Companies and product moves in an agile environment, where requi…
An ETL process for a fictitious streaming service, Amazing Prime, was developed in Jupyter Notebook. The code was then refactored into a Python script to automate the ETL process.
Project for exploration of extract, transform, load process using Python, mongoDB and Flask. Data sets included cryptocurrency pricing and COVID case counts.
Team project performing ETL on 2020 U.S. Election data, using jupyter notebook, PostgreSQL, and quickDBD.
Alrogoritimo de detecção sinonímia, feito na disciplina de Estruturas de dados.
For this project I am creating an ETL (Extract, Transform, and Load) pipeline using Python, RegEx, and SQL Database. The goal is to retrieve data from different sources, clean and transform it into a useful format and finally load the data into an SQL database where the data is ready for further analysis. The result is an established automated p…
For this project, I performed ETL on several movie datasets to predict popular films for a streaming service.
Amazing Prime loves the dataset and wants to keep it updated on a daily basis. The purpose of the analysis is to clean and merge data using ETL process.
Using the ETL process to clean and merge data.
We examine two data sets relate with the music Industry. We Extract, transform and load the data sets in order to create a data base and identify insides and trends about the music Industry.
Udacity nd027 Data Modeling with Postgres
Data engineering project in Python to perform ETL and CRUD operations on 2M+ Yelp reviews, and 2 cities’ housing data using Pandas, SQLAlchemy, NumPy, PrimaryKeys.
A ETL group project investigating eSports earnings
This repository contains code for building a Data Warehouse from scratch. I started with the elicitation process, then used functional dependencies for conversion to GOM4DW schema, followed by conversion to Star Schema to find out different facts and dimensions and lastly I implemented the ETL process. I have used HTML and flask to provide a use…
I made various data normalization operations with python scripts. Target data in CSV format
We going to examine two data sets relate with the music Industry. We want Extract, transform and load this in order to identify insides and trend about the music Industry.
A Case Study of Extract, Transform, Load. Documentaion includes sources of data, types of data wrangling performed (data cleaning, joining, filtering, and aggregating) and the schemata used in the final production database. Technologies used include Pandas, PostgreSQL, Jupyter Notebook.
python ETL framework
Add a description, image, and links to the etl-process topic page so that developers can more easily learn about it.
To associate your repository with the etl-process topic, visit your repo's landing page and select "manage topics."