Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

mixed concerns and dependency management issues #2

Open
georgim0 opened this issue Mar 16, 2020 · 1 comment
Open

mixed concerns and dependency management issues #2

georgim0 opened this issue Mar 16, 2020 · 1 comment

Comments

@georgim0
Copy link

georgim0 commented Mar 16, 2020

Hi there,

Excited to see this tutorial as it's something we've been struggling in the past.

I've tried mixing airflow, dbt and ge in the past.. This approach has 2 issues:

  • dependency management nightmare. too many transitive dependencies
  • mixed concerns:
    • GE performs data quality checks
    • DBT creates/updates tables in your DW
    • Airflow triggers jobs

Here's the approach we've taken:

  • have 2 repos:
    • model definitions along with their expectations in 1 repo that spits out 2 dockerised containers:
      • one for dbt runs
      • one for GE runs
    • airflow dag definitions
      • triggers dbt container
      • triggers GE container

Could you validate our approach please? Is GE used the way it was designed to?

@eugmandel
Copy link
Contributor

The approach sounds good to me. You should create one GE DataContext (project) created - probably in the repo with the dbt models.
Then the airflow task that invokes validation loads the project's config from the models repo.
You can parameterize the credentials of the database (your datasource) using env variables.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants