managed workflow orchestration service built on Apache Airflow with Cloud Composer
managed service for storing unstructured data with Google Cloud Storage
builded the Data warehouse with BigQuery
1.Select your project
2.Create the environment : " Composer 1 "
3.SET UP Cloud Composer
- Name : Your Project
- Location : asia-east2 (it will use composer in HongKong)
- Image version : composer-1.20.12-airflow-2.4.3
- Node count : 3
- Zone : asia-east2-b
- Machine type : n1-standard-2
- Disk size (GB) : 30 minimum
- Number of schedulers : 1
Spectify libraries from the Python Package.
- pymysql
- requests
- pandas
click Admin and select Connections
- Host : The host to connect
- Schema : Specify the schame name to be used in the database
- Login : Specify the user name to connect
- Password : your host password
- Port : MySQL port
When you create a Cloud Composer environment, Google Cloud Storage will automatically create a bucket that is connected to the environment.
- Open Google BigQuery
- Create data set
- Select : the project
- Fill : the data set ID
- Location type : Region
- Region : asia-east2(Hong Kong)
- Open Cloud Shell in Google Cloud Storage, Upload files
- Files : Airflow DAG definition file
- Task1 : PythonOperator - get data from database
- Task2 : PythonOperator - get REST API
- Task3 : PythonOperator - merge data from transaction path and conversion path
- Task4 : GCSToBigQueryOperator - Upload output path to Data Warehouse (BigQuery)
- Upload to GCS using gsutil command : [ $ gsutil cp DAGS files gs:// BUCKET / folder ]
Open Cloud Composer it shows your environment.
1.Click "OPEN AIRFLOW UI "
2.Select the environment in Airflow
3.Open Google BigQuery
4.Select the Data set. it will show the table from Task4.
this table has 1.9 million rows