Docker를 사용하여 Hadoop 생태계의 구성 요소와 기타 필수 서비스를 컨테이너화하여 강력한 데이터 엔지니어링 환경을 설정하는 방법을 보여줍니다. 설정에는 Hadoop (HDFS, YARN), Apache Hive, PostgreSQL 및 Apache Airflow가 포함되며, 이들 모두가 원활하게 작동하도록 구성되어 있습니다.
-
Updated
May 28, 2024 - Shell
Docker를 사용하여 Hadoop 생태계의 구성 요소와 기타 필수 서비스를 컨테이너화하여 강력한 데이터 엔지니어링 환경을 설정하는 방법을 보여줍니다. 설정에는 Hadoop (HDFS, YARN), Apache Hive, PostgreSQL 및 Apache Airflow가 포함되며, 이들 모두가 원활하게 작동하도록 구성되어 있습니다.
An open-source project dedicated to constructing robust data pipelines and scalable software infrastructure. We leverage industry-standard tools favored by developers to enhance efficiency and reliability. Uniquely, these pipelines are field-tested on farms across Sumatra, Indonesia, ensuring real-world applicability and resilience.
This repository includes data engineering projects using Apache Airflow. I hope to add more projects using different technologies soon!
Airflow pipeline to finetune LLM on Kubernetes
Simple python script for easy local airflow deployment with docker. Packed with additional components. Will be adding more going forward.
Sync DAG changes from Git to Airflow
MLOps, haciendo un ETL sencillo usando Docker y Airflow y Google Cloud
Apache Airflow For Data Engineers Tutorial
🎵 LyricWave - Your AI Music Composer 🎶 Compose Unique MP4 Songs Effortlessly! LyricWave uses AI to create personalized music by harmonizing lyrics with captivating melodies and synthetic vocals. Unleash your musical creativity today! 🚀🎶
ETL (Extract, Transform, Load) pipeline to integrate sales data from various sources into a central data warehouse
Welcome to my Apache Airflow learning journey repository! 🚀 This repository serves as a comprehensive documentation of my exploration and understanding of Apache Airflow, an open-source platform for orchestrating complex workflows.
Source code of the Apache Airflow Tutorial for Beginners on YouTube Channel Coder2j (https://www.youtube.com/c/coder2j)
En este repositorio se encuentran algunos dags realizados siguiendo el entrenamiento de data engineer con el fin de aprender y practicar airflow
Apache airflow packed in docker compose
Automating Data Scrapers With Python and Airflow
This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and workflows.
An end-to-end pipeline that ingests raw data from CSV files through Airflow DAGS into BigQuery. From there, it uses dbt to normalize and clean the data and afterwards to make the transformations and come up with relevan reports.
GlassdoorETL automates ETL job for company data from Glassdoor into a PostgreSQL database. Utilizing Airflow and Docker, it ensures timely updates and consistency. A flexible tool for data engineers, it provides easy deployment and management for insightful perspectives from Glassdoor data.
Add a description, image, and links to the airflow-docker topic page so that developers can more easily learn about it.
To associate your repository with the airflow-docker topic, visit your repo's landing page and select "manage topics."