Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expecting all purpose cluster stop be utilised when configured job cluster detail #576

Open
tade0726 opened this issue Feb 5, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@tade0726
Copy link

tade0726 commented Feb 5, 2024

Describe the bug

When I try the feature of running jobs on the job cluster instead of the all-purpose cluster, both the purpose cluster and the job cluster trigger, and only when the all-purpose starts, and the the job cluster follows.

Steps To Reproduce

  1. Configure all purpose cluster detail in profiles.yaml
  2. Configure job cluster in dbt_project.yml, follow instructions (https://docs.getdbt.com/docs/build/python-models#specific-data-platforms)

Expected behavior

Once configured job cluster detail provided, all purpose cluster should not be trigged.

Screenshots and log output

System information

The output of dbt --version:

Core:
  - installed: 1.7.3
  - latest:    1.7.7 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - databricks: 1.7.2 - Update available!
  - spark:      1.7.1 - Up to date!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

The operating system you're using:

Mac OS, should be irrelevant

The output of python --version:

Python 3.10.13

Additional context

NaN

@tade0726 tade0726 added the bug Something isn't working label Feb 5, 2024
@benc-db
Copy link
Collaborator

benc-db commented Feb 5, 2024

This is expected behavior, as python models are integrated into the rest of your dbt project using SQL (for example, on an incremental model, the merge behavior is conducted in SQL), and that SQL would be executed on the AP Cluster. We are investigating ways for python model behavior to be more 'spark-like', but for now I would say this is an enhancement request, rather than a bug, as it is consistent with the structure imposed by dbt-core.

@benc-db benc-db added enhancement New feature or request and removed bug Something isn't working labels Feb 5, 2024
@tade0726
Copy link
Author

tade0726 commented Feb 5, 2024

Thanks, Benc. It clears my doubts.

@leo-schick
Copy link

@benc-db Would it be possible to use a more simple approach when running a python model inside a job cluster like following:

  1. dbt creates a new notebook for the python model
  2. the new notebook is executed withing dbt using python command dbutils.notebook.run("....") (see Run a Databricks notebook from another notebook) inside a own process

I am not sure but it looks to me, that the strict seperation between execution (dbt python code) and the model execution (putting model into an isolated space) seems to be a bit oversized on Databricks job clusters, because the job will run nevertheless on spark on the master node. But maybe I am not getting the full picture of this issue...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants