Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expect_Column_Values_To_Match_Regex Test is Failing with an argument error #274

Open
brian-custer opened this issue Aug 16, 2023 · 12 comments
Labels
unsupported_platform This issue is for a platform we don't support yet

Comments

@brian-custer
Copy link

Is this a new bug in dbt-expectations?

-I believe this is a new bug

  • I have searched the existing issues, and I could not find an existing issue for this bug

I have defined the test using the following code in my models yaml file:

  • dbt_expectations.expect_column_values_to_match_regex:
    regex: "^[0-9]{2}/[0-9]{2}$"
    is_raw: true
    row_condition: "CreditCardExpirationDate is not null"

Expected Behavior

I expect the test to work.

Steps To Reproduce

Configure your test like the above in a models yaml file.

Relevant log output

The log output is: Error executing test: regexp_instr requires 2 arguments but 4 were given.

Environment

The environment is vs code and dbt core.

- OS: Windows 11
- Python: 3.10
- dbt: dbt-core 1.5.4
- adapter: dbt-databricks 1.5.5
- dbt-expectations:

Which database adapter are you using with dbt?

dbt-databricks 1.5.5
Note: dbt-expectations currently does not support database adapters other than the ones listed below.

  • Postgres
  • Snowflake
  • BigQuery

Additional Context

I am using the shim dbt-sparkutils to compensate for the fact that expectations doesn't run in databricks.

@clausherther
Copy link
Contributor

Hi @brian-custer! Doesn't look like dbt-sparkutils overrides any dbt-expectations macros, and the one that's causing your issue is dbt_expectations.regexp_instr, which expects 4 parameters for the default implementation.
We don't have a spark implementation for this at the moment since we don't have a CI/CD environment for spark set up.
Best bet at the moment is to add a shim for dbt_expectations to dbt-sparkutils.

@brian-custer
Copy link
Author

I've done that and it is still failing with the error I gave you. Any ideas how we can coax the test into working?

@clausherther
Copy link
Contributor

Sorry, not sure I'm following. What exactly have you already done?

@brian-custer
Copy link
Author

brian-custer commented Aug 16, 2023 via email

@clausherther
Copy link
Contributor

Right, your issue is that the dbt-sparkutils package does nothing to help you run dbt-expectations on databricks since it doesn't implement any shims for it. Unless you or someone adds spark support for regexp_instr to dbt-sparkutils, you're going continue getting this error. Your other option is to implement the shim locally in your project.

@clausherther clausherther added the unsupported_platform This issue is for a platform we don't support yet label Aug 16, 2023
@brian-custer
Copy link
Author

brian-custer commented Aug 16, 2023 via email

@clausherther
Copy link
Contributor

clausherther commented Aug 17, 2023

We actually removed the reference to spark-utils in the README when we deprecated support for dbt-utils back in Nov '22 (#217 https://github.com/calogica/dbt-expectations/pull/217/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L44)

@bry890
Copy link

bry890 commented Aug 17, 2023

Hi y'all

I had the same issue with the dbt_expectations.expect_column_values_to_match_regex test on Databricks. As @clausherther mentioned the problem seems to be that Databricks regexp_instr function only accepts two arguments, whereas the default is passing in four.

As a quick fix, I added the following macro in my project:

-- myproject/macros/databricks__regexp_instr.sql

{% macro databricks__regexp_instr(source_value, regexp, position, occurrence, is_raw, flags) %}
    -- Put your Databricks-compatible regexp_instr call here
    -- This is just an example; you'll need to modify it based on your needs and if your regexp is raw or not
    -- https://docs.databricks.com/en/sql/language-manual/functions/regexp_instr.html
    -- https://docs.databricks.com/en/sql/language-manual/data-types/string-type.html
    regexp_instr({{ source_value }}, '{{ regexp }}')
{% endmacro %}

@brian-custer
Copy link
Author

brian-custer commented Aug 17, 2023 via email

@clausherther
Copy link
Contributor

FYI, support for Spark in dbt-date released today, working on Spark support for dbt-expectations. See https://getdbt.slack.com/archives/CU4MRJ7QB/p1692723790034329.

@clausherther
Copy link
Contributor

If anyone has experience with Regex parsing in dbt-spark, I'd appreciate the assist here: https://getdbt.slack.com/archives/CNGCW8HKL/p1692733472369839

@brian-custer
Copy link
Author

brian-custer commented Aug 22, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unsupported_platform This issue is for a platform we don't support yet
Projects
None yet
Development

No branches or pull requests

3 participants