Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Clone materialization raises an error when cloning Python models #645

Closed
2 tasks done
jeancochrane opened this issue May 10, 2024 · 2 comments · Fixed by #651
Closed
2 tasks done

[Bug] Clone materialization raises an error when cloning Python models #645

jeancochrane opened this issue May 10, 2024 · 2 comments · Fixed by #651
Labels
bug Something isn't working

Comments

@jeancochrane
Copy link
Contributor

jeancochrane commented May 10, 2024

Is this a new bug in dbt-athena?

  • I believe this is a new bug in dbt-athena
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Running dbt clone on a Python model raises the following error:

$ dbt clone --select reporting.ratio_stats --state master-cache
16:46:38  Running with dbt=1.7.11
16:46:38  Registered adapter: athena=1.7.1
16:46:39  Found 82 models, 5 seeds, 415 tests, 136 sources, 10 exposures, 0 metrics, 595 macros, 0 groups, 0 semantic models
16:46:39
16:46:44  Concurrency: 5 threads (target='dev')
16:46:44
Failed to execute query.
Traceback (most recent call last):
  File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/pyathena/common.py", line 522, in _execute
    query_id = retry_api_call(
  File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/pyathena/util.py", line 85, in retry_api_call
    return retry(func, *args, **kwargs)
  File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/botocore/client.py", line 565, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/jecochr/data-architecture/dbt/venv/lib/python3.10/site-packages/botocore/client.py", line 1021, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 5:5: mismatched input 'None'. Expecting: <query>
Failed to execute query.
16:46:48
16:46:48  Completed with 1 error and 0 warnings:
16:46:48
16:46:48    Runtime Error in model reporting.ratio_stats (models/reporting/reporting.ratio_stats.py)
  An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 5:5: mismatched input 'None'. Expecting: <query>
16:46:48
16:46:48  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

The root cause of the error is that dbt's builtin clone materialization macro calls the dbt-athena view materialization macro, which in turn calls create_or_replace_view, which references the sql context object instead of compiled_code, which returns None for Python models. This results in a clone view query of the form create or replace view <clone_view> as None, which raises the above error. Here's the line in create_or_replace_view that causes the error:

And here are the definitions of compiled_code and sql in dbt-core (source):

    @contextproperty()
    def compiled_code(self) -> Optional[str]:
        # TODO: avoid routing on args.which if possible
        if getattr(self.model, "defer_relation", None) and self.config.args.which == "clone":
            # TODO https://github.com/dbt-labs/dbt-core/issues/7976
            return f"select * from {self.model.defer_relation.relation_name or str(self.defer_relation)}"  # type: ignore[union-attr]
        elif getattr(self.model, "extra_ctes_injected", None):
            # TODO CT-211
            return self.model.compiled_code  # type: ignore[union-attr]
        else:
            return None


    @contextproperty()
    def sql(self) -> Optional[str]:
        # only set this for sql models, for backward compatibility
        if self.model.language == ModelLanguage.sql:  # type: ignore[union-attr]
            return self.compiled_code
        else:
            return None

Expected Behavior

dbt clone should not raise an error when cloning Python models. It should support clone materialization by referencing the compiled_code context object when generating the clone view query rather than the sql context object.

Steps To Reproduce

  1. Setup a dbt config with two targets, dev and prod
  2. Define and build a dummy Python model that just runs print("hello world") in the prod target
  3. Rename the target/ directory to prod-state/
  4. Run dbt clone --state prod-state
  5. Confirm you see the same error as listed above

Environment

- OS: Ubuntu 22.04.4
- Python: Python 3.10.12
- dbt: 1.7.11
- dbt-athena-community: 1.7.2

Additional Context

This particular bug is blocking us on our use of clone materialiazation, but I think it also implicates the create_table_csv_upload macro and a few snapshot macros like hive_snapshot_merge_sql that also reference the sql context variable instead of compiled_code. I see that the docs for Python models explicitly list the lack of snapshot materialization support as a limitation, so I'm wondering if there's a deeper reason why these parts of the codebase haven't yet been transitioned from sql to compiled_code?

Either way, we've tested out switching to compiled_code for clone materialization in our environment and it seems to work, so I'm happy to put up a patch with our changes. I just want to make sure that I'm not barging into a discussion that has been put on the backburner for a good reason.

@jeancochrane jeancochrane added the bug Something isn't working label May 10, 2024
@nicor88
Copy link
Member

nicor88 commented May 11, 2024

@jeancochrane seems that you find the possible root-cause. Feel free to propose a bug fix, ideally covered by functional testing (I can help you on that part).

@jeancochrane
Copy link
Contributor Author

Bugfix PR open in #651 @nicor88! Let me know if you need anything else from me ahead of the review process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants