Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Sync dbt model tags with Databricks table/view tags #606

Open
AliAl-Gburi opened this issue Mar 6, 2024 · 4 comments
Open
Labels
enhancement New feature or request

Comments

@AliAl-Gburi
Copy link

Description

Dbt allow you to tag your models, and you can also tag your tables and views in Databricks. There should be a way to apply dbt tags to the tables and views in Databricks.

Alternative

I have considered implementing a custom solution using:

  1. Python to read the tags from the dbt models
  2. Databricks-sql-cli to read the tags from databricks
  3. Then databricks-sql-cli again to update the tags in databricks such that they are aligned with the dbt tags

Benefits

This is useful for filtering the tables in databricks.

I would like to contribute in creating a seamless syncing process.

@AliAl-Gburi AliAl-Gburi added the enhancement New feature or request label Mar 6, 2024
@benc-db
Copy link
Collaborator

benc-db commented Mar 6, 2024

@AliAl-Gburi this is something we are thinking about as well. I think where things are a little complicated is that some dbt tags are strictly for dbt operations (e.g. I tag some tables with 'daily' for scheduling daily runs), but maybe it doesn't matter if those get synced to Databricks.

What are your thoughts on tags in Databricks that are not found in the dbt project? Specifically, how can a dbt project indicate that it wants to remove a tag? We have a similar issue with materialized views, where a tblproperty gets set by Databricks and we have to figure out the meaning of the absence of that tblproperty in the dbt project.

@AliAl-Gburi
Copy link
Author

Hey @benc-db, thanks for the answer :D. For those who want dbt tags to remain in dbt and not overwrite whatever tags they've set in Databricks, the option to sync tags can be turned off.

I suppose "syncing" was not the correct term to use here. The idea is to have dbt tags be your single source of truth and then Databricks tables and views would match the tags defined in dbt.

@benc-db
Copy link
Collaborator

benc-db commented Apr 2, 2024

Hi @AliAl-Gburi. I've got a PR for this now, for our 2.0.0 release (aligning with the dbt-core 1.8.0 release).

@benc-db benc-db mentioned this issue Apr 2, 2024
3 tasks
@AliAl-Gburi
Copy link
Author

Thats great to hear, thanks a lot :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants