Skip to content

YAML config reference

Adler Santos edited this page Jun 28, 2022 · 2 revisions

YAML config files (or simply called "config files") are where you define your DAGs (pipelines) along with the GCP resources and container images required by your pipelines. These files must be located as follows:

  • datasets/DATASET_NAME/pipelines/dataset.yaml
  • datasets/DATASET_NAME/pipelines/PIPELINE_NAME/pipeline.yaml

The samples folder contains comprehensive reference files for what can be include in the YAML config files. They contain explanations on how to configure every GCP resource, Airflow operator, Airflow variable, and container images for your data pipeline.

When building a new pipeline, you will probably only use a subset of what's currently supported. So feel free to copy the sample files in your working directories and modify them as needed.

dataset.yaml config

For dataset configuration reference, see the samples/dataset.yaml reference.

pipeline.yaml config

For pipeline configuration reference, see the samples/pipeline.yaml reference which uses Airflow 2.

We also support Airflow 1.10 operators, see samples/pipeline.airflow1.yaml as a reference. Note that we will deprecate Airflow 1 support in Q3 2022.