This repository contains an example project (Jaffle Shop) demonstrating integration between Superset and dbt, with BigQuery as the data warehouse.
- Python 3.11
- Poetry
- A Kubernetes cluster (and
helm
installed)
We'll be following the steps from Running Superset on Kubernetes to install Superset locally. If you prefer installing/running it via other methods (using pip or docker compose), you're free to do so.
-
Add the Superset helm repository with the following command:
$ helm repo add superset https://apache.github.io/superset
-
Set
SECRET_KEY
inhelm/superset-values.yaml
. Useopenssl rand -base64 42
to generate a key. -
Install and run
$ helm upgrade --install --values --debug helm/superset-values.yaml superset superset/superset
-
Visit http://localhost:8088
- Run
poetry install
in the root directory. - Use
poetry shell
to spawn a shell within the virtual environment. - Run
dbt --version
to ensure the version is1.5.x
. - Run
dbt deps
to install some dbt dependencies needed (as declared indbt_jaffle_shop/packages.yml
).$ cd dbt_jaffle_shop $ dbt deps
- Create a BigQuery service account (with BigQuery admin access) and generate a JSON credentials key for it. Replace
PATH_TO_KEYFILE
indbt_jaffle_shop/profiles.yml
with the downloaded JSON file path.
Note
We'll be using the same credentials file for dbt
(to run materialization jobs) and for Superset
(to run queries against BQ tables). In the real world, these will likely exist as separate credentials and scoped down to the specific permissions required.
- Create a BigQuery dataset with the default options (or use an existing one). Replace
BIGQUERY_DATASET
with the name. - Replace
GCP_PROJECT
with the name of your GCP project.
-
Materialize assets described in the
dbt_jaffle_shop/models/
directory$ cd dbt_jaffle_shop $ dbt run
Example logs from `dbt run`
$ dbt run 23:47:38 Running with dbt=1.5.9 23:47:38 Registered adapter: bigquery=1.5.7 23:47:38 Unable to do partial parsing because profile has changed 23:47:39 Found 12 models, 36 tests, 0 snapshots, 4 analyses, 587 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 4 metrics, 0 groups 23:47:39 23:47:40 Concurrency: 1 threads (target='dev') 23:47:40 23:47:40 1 of 12 START sql view model my_dataset.stg_customers .......................... [RUN] 23:47:41 1 of 12 OK created sql view model my_dataset.stg_customers ..................... [CREATE VIEW (0 processed) in 1.20s] 23:47:41 2 of 12 START sql view model my_dataset.stg_orders ............................. [RUN] 23:47:44 2 of 12 OK created sql view model my_dataset.stg_orders ........................ [CREATE VIEW (0 processed) in 2.42s] 23:47:44 3 of 12 START sql view model my_dataset.stg_payments ........................... [RUN] 23:47:45 3 of 12 OK created sql view model my_dataset.stg_payments ...................... [CREATE VIEW (0 processed) in 1.34s] 23:47:45 4 of 12 START sql table model my_dataset.dbt_metrics_default_calendar .......... [RUN] 23:47:51 4 of 12 OK created sql table model my_dataset.dbt_metrics_default_calendar ..... [CREATE TABLE (14.6k rows, 0 processed) in 6.40s] 23:47:51 5 of 12 START sql table model my_dataset.int_customer_order_history_joined ..... [RUN] 23:47:59 5 of 12 OK created sql table model my_dataset.int_customer_order_history_joined [CREATE TABLE (100.0 rows, 4.3 KiB processed) in 7.65s] 23:47:59 6 of 12 START sql table model my_dataset.int_order_payments_pivoted ............ [RUN] 23:48:08 6 of 12 OK created sql table model my_dataset.int_order_payments_pivoted ....... [CREATE TABLE (99.0 rows, 6.7 KiB processed) in 8.80s] 23:48:08 7 of 12 START sql table model my_dataset.example_develop_metric ................ [RUN] 23:48:08 WARNING: dbt_metrics is going to be deprecated in dbt-core 1.6 in July 2023 as part of the migration to MetricFlow. This package will continue to work with dbt-core 1.5 but a 1.6 version will not be released. If you have any questions, please join us in the #dbt-core-metrics in the dbt Community Slack 23:48:12 7 of 12 OK created sql table model my_dataset.example_develop_metric ........... [CREATE TABLE (4.0 rows, 229.8 KiB processed) in 4.67s] 23:48:12 8 of 12 START sql table model my_dataset.orders ................................ [RUN] 23:48:18 8 of 12 OK created sql table model my_dataset.orders ........................... [CREATE TABLE (99.0 rows, 11.9 KiB processed) in 5.56s] 23:48:18 9 of 12 START sql table model my_dataset.average_order_amount_daily ............ [RUN] 23:48:18 WARNING: dbt_metrics is going to be deprecated in dbt-core 1.6 in July 2023 as part of the migration to MetricFlow. This package will continue to work with dbt-core 1.5 but a 1.6 version will not be released. If you have any questions, please join us in the #dbt-core-metrics in the dbt Community Slack 23:48:23 9 of 12 OK created sql table model my_dataset.average_order_amount_daily ....... [CREATE TABLE (77.0 rows, 116.7 KiB processed) in 5.38s] 23:48:23 10 of 12 START sql table model my_dataset.expenses_daily ....................... [RUN] 23:48:23 WARNING: dbt_metrics is going to be deprecated in dbt-core 1.6 in July 2023 as part of the migration to MetricFlow. This package will continue to work with dbt-core 1.5 but a 1.6 version will not be released. If you have any questions, please join us in the #dbt-core-metrics in the dbt Community Slack 23:48:28 10 of 12 OK created sql table model my_dataset.expenses_daily .................. [CREATE TABLE (55.0 rows, 117.7 KiB processed) in 5.06s] 23:48:28 11 of 12 START sql table model my_dataset.profit_daily ......................... [RUN] 23:48:28 WARNING: dbt_metrics is going to be deprecated in dbt-core 1.6 in July 2023 as part of the migration to MetricFlow. This package will continue to work with dbt-core 1.5 but a 1.6 version will not be released. If you have any questions, please join us in the #dbt-core-metrics in the dbt Community Slack 23:48:33 11 of 12 OK created sql table model my_dataset.profit_daily .................... [CREATE TABLE (55.0 rows, 117.7 KiB processed) in 4.54s] 23:48:33 12 of 12 START sql table model my_dataset.revenue_daily ........................ [RUN] 23:48:33 WARNING: dbt_metrics is going to be deprecated in dbt-core 1.6 in July 2023 as part of the migration to MetricFlow. This package will continue to work with dbt-core 1.5 but a 1.6 version will not be released. If you have any questions, please join us in the #dbt-core-metrics in the dbt Community Slack 23:48:37 12 of 12 OK created sql table model my_dataset.revenue_daily ................... [CREATE TABLE (55.0 rows, 117.7 KiB processed) in 4.48s] 23:48:37 23:48:37 Finished running 3 view models, 9 table models in 0 hours 0 minutes and 58.75 seconds (58.75s). 23:48:37 23:48:37 Completed successfully 23:48:37 23:48:37 Done. PASS=12 WARN=0 ERROR=0 SKIP=0 TOTAL=12
-
Verify that the tables/views have been populated in your BQ dataset. It should look like the screenshot below:
-
Browse generated dbt docs
$ dbt docs generate $ dbt docs serve
- Visit your running Superset instance -> Settings -> Database Connections -> Click "+ Database". This should open up a modal.
- Choose "Google BigQuery" in the list of supported databases.
- Upload the earlier generated service account credentials JSON file
- Hit "Connect"
Note
In Superset, "datasets" can be created from database tables or SQL queries. In BigQuery, a dataset is a collection of tables and views.
- Go to the "New dataset" page from the "Datasets" tab
- Choose "Google BigQuery" as the database (or whatever you named it)
- Choose the schema (the "dataset" in BigQuery, bear with me here) and table. Let's choose
orders
as the table to create the Superset dataset for. - Hit "Create Dataset and Create Chart" (What if you don't want to create a chart? Well, too bad.)
Tip
At this point, you're free to create more datasets off of other BQ tables/views and play around with creating charts/dashboards. Continue on with the rest of this demo to see how you can sync dbt model/column descriptions into Superset.
We'll be using the handy dandy dbt-superset-lineage
. Since using dbt-superset-lineage
requires you to export a Superset access token (to interact with its APIs), a wrapper script is provided in dbt_jaffle_shop/push_descriptions.py
that fetches the access token and pushes descriptions. Tweak the Superset host/port/protocol in the script as needed.
# Drop into the virtualenv if you haven't already
$ poetry shell
# Push model descriptions from dbt into Superset
# Will only push descriptions for the `orders` model since that's the only dataset present
# in both Superset and dbt (if you're following this demo as is).
$ python3 dbt_jaffle_shop/push_descriptions.py
If the script ran successfully, you should be able to see the orders
model description in Superset by hovering over the now-present information tooltip. Click on the dataset, and hover over any of the column names to see their descriptions as well.
From their documentation,
Using dbt-superset-lineage, you can:
- Add dependencies of Superset dashboards to your dbt sources and models
- Sync model and column descriptions from dbt docs to Superset
We'll be limiting the scope of this demo to the latter feature.
Tip
Read this blogpost from the authors of dbt-superset-lineage
for more on the motivation behind creating a dbt <-> Superset connector.
Here's a simple dashboard created from the various metric tables. Makes you go WoW, yeah?
- We setup the example Jaffle Shop dbt project to materialize tables/views into a BigQuery dataset.
- We installed Superset locally (via K8S/helm) and connected BigQuery as a data source.
- We were able to access dbt model/column descriptions in Superset.
With that, you've reached the end of this demonstration. Happy charting and dbt'ing!