Skip to content

Data pipelines

Adler Santos edited this page Jul 1, 2022 · 1 revision

We use a cloud-native, data pipeline architecture for onboarding public datasets to Google Cloud.

For a conceptual overview, please see our blog post.

The architecture unifies the representation of infrastructure components (GCP resources and IAM policies) and data pipelines (Airflow DAGs and variables) through the use of YAML files. From YAML configuration, we can bootstrap DAGs and Google Cloud resources, as well as deploy the pipelines to an existing Cloud Composer environment.

Architecture overview

public-datasets-pipelines