You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First off, thanks for putting this together! Great stuff.
In stg_snowflake_query_tables.sql you are getting all the tables accessed by a query by looking at the direct_objects_accessed column in snowflake.account_usage.access_history. I was reading through these docs to better understand the difference between direct_objects_accessed and base_objects_accessed.
Let's say you have the following query:
SELECT*FROM my_view
And my_view is defined as CREATE VIEW my_view AS ( SELECT * FROM my_table WHERE foo = 'bar'). Based on this example in the docs, it sounds like:
my_view will appear in direct objects accessed
my_table will appear in base objects accessed
Note, probably worth validating this is actually the case. Let's assume it is for now.
As a result, querying stg_snowflake_query_tables could lead some users to think that my_table is not being used, since the only thing getting queried is my_view.
Could be worth throwing a caveat in the README.md (again assuming this is true). As a fix, you could look at modifying the code to check both base_objects and direct_objects, then do a union and distinct across all those (since in most cases, the same table will appear in both direct and base), or keep both around and add some boolean like is_directly_accessed and is_indirectly_accessed -- in base but not in direct.
The text was updated successfully, but these errors were encountered:
Thanks for raising this issue. I originally made the decision to use direct and ignore base because I wanted to align the notion of "object" here with dbt's notion of "model". In the context of dbt I generally don't distinguish between views and tables.
To get at indirect usage, I have a model (not in this repo, yet) derived from the dbt manifest called node_descendents (expanded from the child_map), and I left join onto that, and aggregate both 'direct usage' and 'downstream usage'.
Is there another case where knowing about underlying views (inside or outside of dbt) would be valuable? I've considered making the table more polymorphic- like the modified_objects could be rows (though there seems to always be exactly 1) with an action label: modified, accessed, indirectly_accessed.
Is there another case where knowing about underlying views (inside or outside of dbt) would be valuable?
my current use case is trying to find tables I can remove to save storage costs. for that, i need to know what underlying tables (the things that incur the storage costs) are being used, either directly or indirectly through a view.
First off, thanks for putting this together! Great stuff.
In
stg_snowflake_query_tables.sql
you are getting all the tables accessed by a query by looking at thedirect_objects_accessed
column insnowflake.account_usage.access_history
. I was reading through these docs to better understand the difference betweendirect_objects_accessed
andbase_objects_accessed
.Let's say you have the following query:
And
my_view
is defined asCREATE VIEW my_view AS ( SELECT * FROM my_table WHERE foo = 'bar')
. Based on this example in the docs, it sounds like:my_view
will appear in direct objects accessedmy_table
will appear in base objects accessedNote, probably worth validating this is actually the case. Let's assume it is for now.
As a result, querying
stg_snowflake_query_tables
could lead some users to think thatmy_table
is not being used, since the only thing getting queried ismy_view
.Could be worth throwing a caveat in the
README.md
(again assuming this is true). As a fix, you could look at modifying the code to check bothbase_objects
anddirect_objects
, then do a union and distinct across all those (since in most cases, the same table will appear in bothdirect
andbase
), or keep both around and add some boolean likeis_directly_accessed
andis_indirectly_accessed -- in base but not in direct
.The text was updated successfully, but these errors were encountered: