A Column Level Lineage Graph for dbt
Have you ever wondered what is the column level relationship between the dbt models? Worry not, this tool is intended to help you by creating an interactive graph on a webpage to explore the column level lineage among your models(Currently only supports Postgres, other connection types are under development)!
dbt version >= 1.0.0
New to dbt packages? Read more about them here.
- Include this package in your
packages.yml
file.
packages:
- git: "https://github.com/zshandy/dbt-lineagex.git"
- Run
dbt deps
to install the package.
If you don't have your dbt already ran yet, change directory to your main folder and run:
dbt run
After you've finished running your dbt project, change directory to the "/dbt_packages/lineagex" folder and run:
Note Recommended to have a virtual environment setup for running the Python scripts, as it could fight with other dependencies(Currently it is more like a Python project, but will try to make changes to it so that only changes to the dbt_project.yml needs to be made for running)
./run.sh
Or if you want to run the commands separately, they are:
- pip install -r requirements.txt
- python main.py
This would create the index.html and an output.json in the folder, and just start a http server in the folder, and you get to see the graph
sample command: php -S localhost:8000
- Start by clicking the star on the right(search) and input a model name that you want to start with.
- It should show a table on the canvas with table names and its columns, by clicking the "explore" button on the top right, it will show all the downstream and upstream tables that are related to the columns.
- Hovering over a column will highlight its downstream and upstream columns as well.
- You can navigate through the canvas by clicking "explore" on other tables.
In the output.json file, it can be read by other programs and analyzed for other uses, the general format is as follows:
table1.sql - SELECT column1, column2 FROM schema1.other_table WHERE column3 IS NOT NULL;
table2.sql - SELECT column1 AS new_column1, column2 AS new_column2 from schema1.table1;
{
schema1.other_table: {
tables: [],
columns: {
column1: [], column2: [], column3: []
},
table_name: schema1.other_table
},
schema1.table1: {
tables: [schema1.other_table],
columns: {
column1: [schema1.other_table.columns1, schema1.other_table.columns3], column2: [schema1.other_table.columns2, schema1.other_table.columns3]
},
table_name: schema1.table1
},
schema1.table2: {
tables: [schema1.table1],
columns: {
new_column1: [schema1.table1.columns1], new_column2: [schema1.table1.column2]
},
table_name: schema1.table2
},
}
"not init data"
in the webpage: Possibly due to the content of the JSON in the index.html, please check if it is in valid JSON format, and that all keys are in string format.