You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Bug reported by someone on Slack. They are not able to use the post-hook macro to load seeds from S3 into their Databricks warehouse.
Struggling a bit to load terminologies to DBX w/ Unity (deets here). If I were to just load the terminologies manually, what should I comment out in the DBT project to make sure the terminology tables are not overwritten?
This turned out to be simple. For posterity:
If using a "shared access" cluster or a SQL Serverless warehouse (same idea), the s3 copies fail b/c it's not possible to set the environment variables w/ Tuva bucket keys.
Everything works fine on a "single user cluster", where the user can set environment variables.
I think any config changes would likely have to occur on DBX side, i.e. registering the s3 keys up front in the cluster configuration or registering an "external volume".
For me, the confusion originated from dbt-databricks docs which recommend running on the SQL Warehouse product b/c it's easy to monitor t and debug the queries DBT generates. But I'm doubtful you can connect directly to s3 on the warehouse product without mediating via "Unity Catalog" (by design)
Still no luck here, but it's fin. We can load the seeds on a single user cluster, then schedule all the SQL against the warehouse (you can run tasks on separate clusters in a databricks job.).
But it did occur to me...
The common denominator w/ everyone running this project is python. You need it set up dbt, therefore python to run Tuva.
Perhaps there is an approach to getting s3 seeds up via boto3, then stream the inserts statements into w/e warehouse DBT is connected to? I'm eyeing, but haven't explored below... https://docs.getdbt.com/docs/build/python-models
Describe the bug
Bug reported by someone on Slack. They are not able to use the post-hook macro to load seeds from S3 into their Databricks warehouse.
The text was updated successfully, but these errors were encountered: