Expecting all purpose cluster stop be utilised when configured job cluster detail #576

tade0726 · 2024-02-05T13:10:46Z

Describe the bug

When I try the feature of running jobs on the job cluster instead of the all-purpose cluster, both the purpose cluster and the job cluster trigger, and only when the all-purpose starts, and the the job cluster follows.

Steps To Reproduce

Configure all purpose cluster detail in profiles.yaml
Configure job cluster in dbt_project.yml, follow instructions (https://docs.getdbt.com/docs/build/python-models#specific-data-platforms)

Expected behavior

Once configured job cluster detail provided, all purpose cluster should not be trigged.

Screenshots and log output

System information

The output of dbt --version:

Core:
  - installed: 1.7.3
  - latest:    1.7.7 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - databricks: 1.7.2 - Update available!
  - spark:      1.7.1 - Up to date!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

The operating system you're using:

Mac OS, should be irrelevant

The output of python --version:

Python 3.10.13

Additional context

NaN

The text was updated successfully, but these errors were encountered:

benc-db · 2024-02-05T16:40:29Z

This is expected behavior, as python models are integrated into the rest of your dbt project using SQL (for example, on an incremental model, the merge behavior is conducted in SQL), and that SQL would be executed on the AP Cluster. We are investigating ways for python model behavior to be more 'spark-like', but for now I would say this is an enhancement request, rather than a bug, as it is consistent with the structure imposed by dbt-core.

tade0726 · 2024-02-05T22:06:56Z

Thanks, Benc. It clears my doubts.

leo-schick · 2024-02-07T21:47:30Z

@benc-db Would it be possible to use a more simple approach when running a python model inside a job cluster like following:

dbt creates a new notebook for the python model
the new notebook is executed withing dbt using python command dbutils.notebook.run("....") (see Run a Databricks notebook from another notebook) inside a own process

I am not sure but it looks to me, that the strict seperation between execution (dbt python code) and the model execution (putting model into an isolated space) seems to be a bit oversized on Databricks job clusters, because the job will run nevertheless on spark on the master node. But maybe I am not getting the full picture of this issue...

tade0726 added the bug Something isn't working label Feb 5, 2024

benc-db added enhancement New feature or request and removed bug Something isn't working labels Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expecting all purpose cluster stop be utilised when configured job cluster detail #576

Expecting all purpose cluster stop be utilised when configured job cluster detail #576

tade0726 commented Feb 5, 2024

benc-db commented Feb 5, 2024

tade0726 commented Feb 5, 2024 •

edited

leo-schick commented Feb 7, 2024

Expecting all purpose cluster stop be utilised when configured job cluster detail #576

Expecting all purpose cluster stop be utilised when configured job cluster detail #576

Comments

tade0726 commented Feb 5, 2024

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

Additional context

benc-db commented Feb 5, 2024

tade0726 commented Feb 5, 2024 • edited

leo-schick commented Feb 7, 2024

tade0726 commented Feb 5, 2024 •

edited