POC in Apache Kafka and Spark Streaming using Avro serialization.
-
Updated
Sep 6, 2018 - Scala
POC in Apache Kafka and Spark Streaming using Avro serialization.
ETL pipeline with AWS Redshift orchestrated with Airflow
Codes for data flow between models, data post-process, and visualization
Udacity Data Engineering Nanodegree - Project #2
Short course: Introduction to Machine Learning
Transformation airbnb data set using dbt and snowflake, then visualizing data using preset
Data pipeline to gather data from chess website APIs using Airflow.
Исследование продаж компьютерных игр
An end-to-end data pipeline deployed on GCP that extracts cryptocurrency data for analytics.
Convolutional Neural Network capable of detecting brain tumors and respective locations from 5712 MRI brain scans
A cutting-edge big data initiative aimed at creating a real-time data pipeline to analyze the popularity and sentiments of trending topics on Twitter.
The mini project for the course Database Technologies. The task is to take in data via a pipeline built using spark-streaming and kafka, and store the processed data into a SQLite database for further manipulation
A data pipeline project that leverages Docker and PostgreSQL for efficient data processing and analysis tasks. Uses containerization to ensure portability and reproducibility of the data pipeline.
Deployable AWS data platform to process powerlifting data extracted from openpowerlifting.org.
💸A python module for building portfolio assessment pipeline
This is a basic example of using a pipeline in data science.
This is the data pipeline for the url-shortner application. Deprecated in favor of https://github.com/Dukes-Wine-Co/request-parsing-api
ETL pipeline with PySpark on Dataproc for data lake on Google Cloud Storage
An easy to use, reliable and well designed python module that domain experts and data scientists can use to fetch, visualise, and transform publicly available satellite and LIDAR data.
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."