Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persist_docs support for Glue #167

Open
1 task
klimelau opened this issue Oct 25, 2022 · 7 comments
Open
1 task

persist_docs support for Glue #167

klimelau opened this issue Oct 25, 2022 · 7 comments
Labels
enhancement New feature or request

Comments

@klimelau
Copy link

Describe the feature

Currently the persist_docs works in a way that after each model is built, a COMMENT statement is run.

This statement type is not supported by the Glue catalog, so triggering dbt run will result in following error: "TrinoUserError(type=USER_ERROR, name=NOT_SUPPORTED, message="Table comment is not yet supported by Glue service", query_id=.....)".

This could be mitigated if the comments are added directly to the CREATE statement as table and column comments before each model is created.

Describe alternatives you've considered

No response

Who will benefit?

This feature is mainly for dbt-trino users who leverage Glue as their metastore. This feature is one of the most important aspects of dbt and while you can still leverage the dbt generated documentation, having the metadata propagated to Glue is extremely valuable for exploring data in Trino directly.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@klimelau klimelau added the enhancement New feature or request label Oct 25, 2022
@hovaesco
Copy link
Member

Relevant issue from Trino: trinodb/trino#7487

@hovaesco
Copy link
Member

This could be mitigated if the comments are added directly to the CREATE statement as table and column comments before each model is created.

The problem with this approach is that if we want to update comments on created table then it won't work (only way is to recreate a table) because lack of support for COMMENT statement in Glue metastore. For incremental use cases table is not recreated for each run.

@wrb2
Copy link

wrb2 commented Oct 25, 2022

This is the behavior with Hive connector with Glue. How does it work with Delta or Iceberg connector? Do they handle comments on their own, or rely on Glue as well? I think in "normal Delta use" the comments end up in the Delta log?

@findinpath
Copy link
Collaborator

How does it work with Delta or Iceberg connector?

Iceberg and Delta table formats save the comments internally and do not rely on the metastore for this kind of information.

I think in "normal Delta use" the comments end up in the Delta log?

Correct.

@wrb2
Copy link

wrb2 commented Oct 25, 2022

I tested it and I can confirm it works with Delta connector. It does generate a lot of transactions for each table (1 for table, 1 for table description, 1 for each column description), but hopefully that is scalable. Well, it doesn't work for views, since Delta connector doesn't support those.

@wrb2
Copy link

wrb2 commented Oct 25, 2022

But it seems, that even if you set the comments, you'll never get to them, as per trinodb/trino#13705.

Seems Trino itself will never support that, even for Delta or Iceberg. That is pretty unfortunate.

@hovaesco
Copy link
Member

You can access them by querying system.metadata.table_comments in Trino. It's included in get_catalog macro in dbt-trino https://github.com/starburstdata/dbt-trino/blob/master/dbt/include/trino/macros/catalog.sql#L42-L54

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants