Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Improving Data Validation Output #289

Open
oksanagorbachenko opened this issue Oct 31, 2023 · 0 comments
Open

[Feature Request] Improving Data Validation Output #289

oksanagorbachenko opened this issue Oct 31, 2023 · 0 comments

Comments

@oksanagorbachenko
Copy link

Is your feature request related to a problem? Please describe.

The dbt-expectations library offers a comprehensive suite of data validation tests, but it consistently follows a rigid template for presenting results. The core issue lies in the absence of detailed context when a test fails. For instance, in tests like expect_column_values_not_to_be_null, the output simply states expression-false.To address this limitation, I propose an enhancement that allows users to specify a database column (for example, the unique identifier, such as 'id') as an input parameter for data validation. This addition would substantially improve our ability to identify the specific rows causing the issues, facilitating more effective troubleshooting.

Describe the solution you'd like

The solution I seek involves extending the capabilities of the dbt-expectations library. Specifically, I recommend introducing a parameter that enables users to specify a database column as part of the validation process. When applied, this feature would result in output that includes the designated column's values for rows with failed tests. This enriched output would provide essential context, making it significantly easier to pinpoint the exact rows causing data discrepancies.
Since the main request is generated in the file expression_is_true.sq it will look something like this:

{% macro expression_is_true(model, expression, test_condition="= true", group_by_columns=None, row_condition=None, **select_columns=None**) %}
  with grouped_expression as (
    select
      {% if group_by_columns is not none %}
        {% for group_by_column in group_by_columns %}
          {{ group_by_column }} as col_{{ loop.index }},
        {% endfor %}
      {% endif %}
      {{ dbt_expectations.truth_expression(expression)  ~ ", " ~ **select_columns**}}
...

Describe alternatives you've considered

One potential alternative would involve manual post-processing of the dbt-expectations output to associate failed tests with the corresponding data. However, this manual approach is not only time-consuming but also prone to errors. The optimal solution is to incorporate this feature directly into the library, ensuring a streamlined validation process and providing a more accurate identification of data issues.

Additional context

Furthermore, it would be highly beneficial to enhance the customization capabilities of the validation reports even further. This could include the ability to dynamically add fields from other related tables to the output. Such advanced customization would enable us to create more informative and context-rich reports, facilitating in-depth data analysis and troubleshooting in complex data projects. This level of flexibility and detail in report generation would significantly elevate the utility of the dbt-expectations library and contribute to comprehensive data quality assurance in our project.

@oksanagorbachenko oksanagorbachenko changed the title [Feature Request] [Feature Request] Improving Data Validation Output Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant