Skip to content

Latest commit

 

History

History
137 lines (89 loc) · 3.22 KB

apache-airflow.md

File metadata and controls

137 lines (89 loc) · 3.22 KB

Apache Airflow

managed workflow orchestration service built on Apache Airflow with Cloud Composer

Create Cloud Composer

Data Lake

managed service for storing unstructured data with Google Cloud Storage

Create Google Cloud Storage

Data Warehouse

builded the Data warehouse with BigQuery

Create Cloud Composer

1.Select your project
2.Create the environment : " Composer 1 "

1

3.SET UP Cloud Composer

  • Name : Your Project
  • Location : asia-east2 (it will use composer in HongKong)
  • Image version : composer-1.20.12-airflow-2.4.3
  • Node count : 3
  • Zone : asia-east2-b
  • Machine type : n1-standard-2
  • Disk size (GB) : 30 minimum
  • Number of schedulers : 1

2

Import the Python Package

Spectify libraries from the Python Package.

  • pymysql
  • requests
  • pandas

3

Connected MySQL to Airflow

click Admin and select Connections

  • Host : The host to connect
  • Schema : Specify the schame name to be used in the database
  • Login : Specify the user name to connect
  • Password : your host password
  • Port : MySQL port

4

Create Google Cloud Storage

When you create a Cloud Composer environment, Google Cloud Storage will automatically create a bucket that is connected to the environment.

Create Data set of BigQuery

  1. Open Google BigQuery
  2. Create data set
  • Select : the project
  • Fill : the data set ID
  • Location type : Region
  • Region : asia-east2(Hong Kong)

10

Upload files to GCS

  1. Open Cloud Shell in Google Cloud Storage, Upload files
  2. Files : Airflow DAG definition file

5

  1. My Airflow DAGS File
  • Task1 : PythonOperator - get data from database
  • Task2 : PythonOperator - get REST API
  • Task3 : PythonOperator - merge data from transaction path and conversion path
  • Task4 : GCSToBigQueryOperator - Upload output path to Data Warehouse (BigQuery)

8

  1. Upload to GCS using gsutil command : [ $ gsutil cp DAGS files gs:// BUCKET / folder ]

6

Automate Tasks with Airflow

Open Cloud Composer it shows your environment.

1.Click "OPEN AIRFLOW UI "
2.Select the environment in Airflow

7

3.Open Google BigQuery
4.Select the Data set. it will show the table from Task4.

9

this table has 1.9 million rows