[Feature Request] Improving Data Validation Output #289

oksanagorbachenko · 2023-10-31T10:23:21Z

Is your feature request related to a problem? Please describe.

The dbt-expectations library offers a comprehensive suite of data validation tests, but it consistently follows a rigid template for presenting results. The core issue lies in the absence of detailed context when a test fails. For instance, in tests like expect_column_values_not_to_be_null, the output simply states expression-false.To address this limitation, I propose an enhancement that allows users to specify a database column (for example, the unique identifier, such as 'id') as an input parameter for data validation. This addition would substantially improve our ability to identify the specific rows causing the issues, facilitating more effective troubleshooting.

Describe the solution you'd like

The solution I seek involves extending the capabilities of the dbt-expectations library. Specifically, I recommend introducing a parameter that enables users to specify a database column as part of the validation process. When applied, this feature would result in output that includes the designated column's values for rows with failed tests. This enriched output would provide essential context, making it significantly easier to pinpoint the exact rows causing data discrepancies.
Since the main request is generated in the file expression_is_true.sq it will look something like this:

{% macro expression_is_true(model, expression, test_condition="= true", group_by_columns=None, row_condition=None, **select_columns=None**) %}
  with grouped_expression as (
    select
      {% if group_by_columns is not none %}
        {% for group_by_column in group_by_columns %}
          {{ group_by_column }} as col_{{ loop.index }},
        {% endfor %}
      {% endif %}
      {{ dbt_expectations.truth_expression(expression)  ~ ", " ~ **select_columns**}}
...

Describe alternatives you've considered

One potential alternative would involve manual post-processing of the dbt-expectations output to associate failed tests with the corresponding data. However, this manual approach is not only time-consuming but also prone to errors. The optimal solution is to incorporate this feature directly into the library, ensuring a streamlined validation process and providing a more accurate identification of data issues.

Additional context

Furthermore, it would be highly beneficial to enhance the customization capabilities of the validation reports even further. This could include the ability to dynamically add fields from other related tables to the output. Such advanced customization would enable us to create more informative and context-rich reports, facilitating in-depth data analysis and troubleshooting in complex data projects. This level of flexibility and detail in report generation would significantly elevate the utility of the dbt-expectations library and contribute to comprehensive data quality assurance in our project.

The text was updated successfully, but these errors were encountered:

oksanagorbachenko changed the title ~~[Feature Request]~~ [Feature Request] Improving Data Validation Output Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Improving Data Validation Output #289

[Feature Request] Improving Data Validation Output #289

oksanagorbachenko commented Oct 31, 2023

[Feature Request] Improving Data Validation Output #289

[Feature Request] Improving Data Validation Output #289

Comments

oksanagorbachenko commented Oct 31, 2023