Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema-level tests #197

Open
wasilaq opened this issue Sep 21, 2022 · 1 comment
Open

Schema-level tests #197

wasilaq opened this issue Sep 21, 2022 · 1 comment

Comments

@wasilaq
Copy link

wasilaq commented Sep 21, 2022

I originally posted this as a discussion in dbt-utils but it was suggested I move it here!

dbt tests currently tend to be at the model or column level. This discussion poses the question of whether it would be useful to introduce testing at the schema level as well (e.g. test that all models in the same schema have a particular characteristic). I think it would be interesting to add certain schema tests to this package but wanted to get others' thoughts first before adding things.

The way to do this is by querying the information schema in Snowflake. Below is an example of a test I created for a project to check the marts schema for views:

-- There should be no views in the marts schemas.
-- tests/check_for_marts.sql

select * from {{ target.database }}.information_schema.tables
where table_schema = '{{ (target.schema | upper) + '_MARTS' }}'
and table_type != 'BASE TABLE'

Benefits

The main benefit here is performance and ease of use. Instead of applying a test to each individual model, only one test needs to be configured. In the above example, only one query is run against the data warehouse.

Considerations

The above example isn't customizable (i.e. what's the schema name?). The code can be copy-pasted and adjusted accordingly, but it's not in a friendly format for package inclusion. If this type of test is something other folks are interested in, I'd love to hear ideas on how we can make schema tests more dynamic.

@samLozier
Copy link

I've been thinking about a similar concept for the last few months.
I've wanted to check (ideally without actually executing dbt to do it) what dbt's output will be. my strategy has been to try to intercept the compiled sql just before it gets sent over the wire to snowflake and then parse that sql with another tool like sqlparse, then check for allowed schemas/materializations. This would allow for testing what the prod config is going to try to run without risking anything in prod.

My approach seems harder to implement, so I really like your strategy, but would personally want to change it slightly. Maybe a test interface along the lines of:

- check_that_materialization_db_in_allowed_list
- check_that_materialization_schema_in_allowed_list
- check_that_materialization_is_table 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants