Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Support for large deviation tests for partitioned bigquery tables #237

Open
praveen-prashant opened this issue Jan 10, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@praveen-prashant
Copy link

praveen-prashant commented Jan 10, 2023

Is your feature request related to a problem? Please describe.
We use weekly partitioned bigquery tables for our mart layer. We use dbt to create and update them. We often need to ensure that table row counts do not vary too much week by week, but there is no inbuilt test currently in dbt-expectations that allows to check this

Describe the solution you'd like
A new test that checks table row count deviations of partitions over a threshold. Something like below that throws a warning if counts deviate by more than 20% for a table my_mart_table when compared to the directly preceding partitions

version: 2

models:
  - name: my_mart_table
    tests:
      - not_large_deviation_count:
          severity: warn
          threshold: 0.2
          partition_column: partition_date          

Describe alternatives you've considered
We have implemented a custom test like below:

{% macro test_not_large_deviation_count(model, threshold, partition_column='partition_date',  row_condition='1=1') %}

WITH counts AS (
  SELECT
        {{ partition_column }} AS partition_date,
        COUNT(*) AS observation_count,
  FROM {{ model }}
  WHERE {{ row_condition }}
  GROUP BY 1
),

compare AS (
  SELECT
      partition_date,
      observation_count AS current_value,
      LEAD (observation_count) OVER (ORDER BY partition_date DESC) AS preceding_value
  FROM counts
)

SELECT
      *
FROM compare
WHERE ABS(SAFE_DIVIDE(current_value - preceding_value, preceding_value)) > {{ threshold }}

{% endmacro %}

Additional context
It would be nice to have other column level comparison based tests to check for partition-on-partition deviations in values like max, min, avg etc for columns in a table just like row counts

@praveen-prashant praveen-prashant changed the title [Feature Request] [Feature Request] Support for large deviation tests for partitioned bigquery tables Jan 10, 2023
@clausherther clausherther added the enhancement New feature or request label Jan 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants