Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for including detailed information in BigQuery.listTables #3228

Open
ebyhr opened this issue Apr 4, 2024 · 7 comments
Open

Add support for including detailed information in BigQuery.listTables #3228

ebyhr opened this issue Apr 4, 2024 · 7 comments
Labels
api: bigquery Issues related to the googleapis/java-bigquery API.

Comments

@ebyhr
Copy link

ebyhr commented Apr 4, 2024

Is your feature request related to a problem? Please describe.

Metadata handling is too slow when getting table definitions in bulk.

Describe the solution you'd like

Add a new TableListOption that allows specifying fields so that BigQuery.listTables returns detailed information.

Describe alternatives you've considered

I already tried getting table definitions in parallel on client side, but it's still slow.
Another solution is introducing cache, but we want to avoid as much as possible.

Additional context

trinodb/trino#4580 & trinodb/trino#10741

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/java-bigquery API. label Apr 4, 2024
@ebyhr
Copy link
Author

ebyhr commented Apr 16, 2024

@anoopj What do you think about it?

@anoopj
Copy link
Contributor

anoopj commented Apr 17, 2024

This will require API changes in BigQuery v2 API. If there are many tables, could we run an INFORMATION_SCHEMA query instead?

@ebyhr
Copy link
Author

ebyhr commented Apr 17, 2024

We've observed intermittent SLA failure when using com.google.cloud.bigquery.BigQuery#query on Trino CI. Switching to INFORMATION_SCHEMA doesn't increase such failure possiblity?

@anoopj
Copy link
Contributor

anoopj commented Apr 17, 2024

This is a BigQuery SQL query - so it's capable of processing very large amounts of data. The SLO failures you may be seeing may be related to capacity allocated to your reservation (if you're using reservations). On-demand does not guarantee specific latencies since it uses a shared resource pool.

The alternative proposed here (ie sending detailed info on tables.list API) may be risky. BigQuery has no limits on the number of tables in a dataset, so even with pagination, it will be a lot of data returned.

@ebyhr
Copy link
Author

ebyhr commented Apr 23, 2024

I'm trying to use INFORMATION_SCHEMA now. Are there helper methods converting string type names to BigQuery type in this SDK? I can convert easily in case of simple types, but it looks little hard in case of complex types having sub fields.

@anoopj
Copy link
Contributor

anoopj commented Apr 24, 2024

I'm not aware of any helper methods. @PhongChuong do you know?

@ebyhr
Copy link
Author

ebyhr commented Apr 29, 2024

I have another question. I wanted to use TABLE_OPTIONS for table comments, but the
value is quoted with unknown rule. https://cloud.google.com/bigquery/docs/information-schema-table-options. Can you share the quotation rule and add helper method to this SDK if possible?

+--------------+
| option_value |
+--------------+
| "test data"  |
| "test data"  |
+--------------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/java-bigquery API.
Projects
None yet
Development

No branches or pull requests

2 participants