Skip to content

Orchestration

Joshua Essex edited this page Oct 21, 2020 · 2 revisions

Cloud Composer

Cloud Composer is an open source, fully managed version of Apache Airflow on Google Cloud Platform. Operations are orchestrated, scheduled, and run on composer through Directed Acyclic Graphs (DAGs). The graphs are a collection of organized tasks that you want to schedule and run. A single organized task is also called an operator.

Cloud Composer at Recidiviz

After ingest pipelines are completed, a message is published to a Pub / Sub topic. A Cloud Function is listening to this topic and triggers the DAG to run. This DAG orchestrates the calculation pipelines to run in parallel. Once all the pipelines for a particular state are finished running, then a state specific HTTP request is made which triggers the export from Big Query to GCS of the state related files. This ensures that even if pipelines for one state fails, a data export is still made for the other state.

The DAG we have in production and staging is called calculation_pipeline_dag and Airflow UIs are available for administration in both staging and production.