-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support column level statistics #514
Comments
Did anyone find a use case for the column statistics feature? I tried to apply it to the unique Id field of a big table and after several minutes it computed a totally wrong number of unique values. Also it did not speed up simple sql queries at all. I agree that it looks appealing to automate these statistics with dbt. But would it be useful in real life? Given that it can slow down project building significantly. |
@roslovets I believe the main reason is a potential performance gain indeed, according to this new Cost-Based Optimizer for Athena. I haven't seen hands-on test results yet though. |
Here another article where AWS shows improved performances |
Thank you for the links folks. According to their fancy examples we should be able to really save time on downstream models and tests even if it takes up to several minutes to compute statistics for one table. But I still cannot get why Maybe you could do tests on your big tables as well? |
https://aws.amazon.com/about-aws/whats-new/2023/11/aws-glue-data-catalog-generating-column-level-statistics/
Add additional configurations that allow the user to add column level statistics to the table.
Minimal config to make it work:
Open questions
are all table types supported?seems only supported by hive tables, not iceberg.Notes
Currently not available in all regions
The text was updated successfully, but these errors were encountered: