Skip to content

Commit

Permalink
feat!: Pipeline YAML template using Airflow 2 operators (#138)
Browse files Browse the repository at this point in the history
* new DAG operator imports for Airflow 2.0+

* support Airflow 2 operators when generating DAGs

* version compatibility checks for DAGs and Airflow environments

* tests to copy pipeline.yaml to dot folder (for airflow version spec)

* tests for version compatibility checks

* revised pipeline YAML template for Airflow 1 compat

* feat!: Upgrade dependencies to Airflow 2.1.1 (#135)

* revised dependendices for Airflow 2.1.1

* use proper variables file as required by Airflow 2

* changed README to use Airflow 2 requirement

* uncommented deploy_dag_versioning check

* feat: Upgrade `usa_names` pipeline to usse Airflow 2 operators and environment (#136)

* default to Airflow 2

* README revisions for old and new YAML references

* sample pipeline YAML template for Airflow 2 operators

* add license header to unit-tests

* require airflow_version to be explicitly stated

* set default Cloud Composer version to Airflow 2

* tests to verify correct Airflow operators based on version specified

* additional GitHub check for unit testing Airflow 1 pipelines

* set default Airflow version to 2

* retrigger checks
  • Loading branch information
adlersantos committed Aug 11, 2021
1 parent b2749c6 commit 90ae7cd
Show file tree
Hide file tree
Showing 10 changed files with 627 additions and 118 deletions.
41 changes: 41 additions & 0 deletions .github/workflows/unit-tests-airflow1.yaml
@@ -0,0 +1,41 @@
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: Run unit tests for Airflow 1.10 operators
on: [pull_request]
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.8]
steps:
- uses: actions/checkout@v2
- uses: hashicorp/setup-terraform@v1
with:
terraform_version: 0.15.1
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install pipenv
run: pip install pipenv
- name: Install dependencies
run: pipenv install --ignore-pipfile --dev
- name: Initialize Airflow
run: pipenv run airflow db init
- name: Setup Airflow 1.10 pipeline YAML config
run: cp samples/pipeline.airflow1.yaml samples/pipeline.yaml
- name: Run tests
run: pipenv run python -m pytest -v
14 changes: 14 additions & 0 deletions .github/workflows/unit-tests.yaml
@@ -1,3 +1,17 @@
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: Run unit tests
on: [pull_request]
jobs:
Expand Down
14 changes: 10 additions & 4 deletions README.md
Expand Up @@ -58,7 +58,9 @@ Use only underscores and alpha-numeric characters for the names.

If you created a new dataset directory above, you need to create a `datasets/DATASET/dataset.yaml` config file. See this [section](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/README.md#yaml-config-reference) for the `dataset.yaml` reference.

Create a `datasets/DATASET/PIPELINE/pipeline.yaml` config file for your pipeline. See [here](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/samples/pipeline.yaml) for the `pipeline.yaml` reference.
Create a `datasets/DATASET/PIPELINE/pipeline.yaml` config file for your pipeline. See [here](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/samples/) for the `pipeline.yaml` references.

For a YAML config template using Airflow 1.10 operators, see [`samples/pipeline.airflow1.yaml`](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/samples/pipeline.airflow1.yaml).

If you'd like to get started faster, you can inspect config files that already exist in the repository and infer the patterns from there:

Expand Down Expand Up @@ -219,10 +221,14 @@ $ pipenv run python -m pytest -v

# YAML Config Reference

Every dataset and pipeline folder must contain a `dataset.yaml` and a `pipeline.yaml` configuration file, respectively:
Every dataset and pipeline folder must contain a `dataset.yaml` and a `pipeline.yaml` configuration file, respectively.

The `samples` folder contains references for the YAML config files, complete with descriptions for config blocks and Airflow operators and parameters. When creating a new dataset or pipeline, you can copy them to your specific dataset/pipeline paths to be used as templates.

- For dataset configuration syntax, see [`samples/dataset.yaml`](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/samples/dataset.yaml) as a reference.
- For pipeline configuration syntax, see [`samples/pipeline.yaml`](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/samples/pipeline.yaml) as a reference.
- For dataset configuration syntax, see the [`samples/dataset.yaml`](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/samples/dataset.yaml) reference.
- For pipeline configuration syntax:
- For the default Airflow 2 operators, see the [`samples/pipeline.yaml`](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/samples/pipeline.yaml) reference.
- If you'd like to use Airflow 1.10 operators, see the [`samples/pipeline.airflow1.yaml`](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/samples/pipeline.yaml) as a reference.


# Best Practices
Expand Down

0 comments on commit 90ae7cd

Please sign in to comment.