Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed when listing columns in BigQuery #21920

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

nineinchnick
Copy link
Member

Description

A continuation of #21830. Test coverage of the type parser is at 99% of lines and 95% of branches. Original description below.

Using mini parser because BigQuery Java SDK doesn't support translating string to BigQuery type as far as I asked Google engineers.

This PR improves listing columns. Test with 169 tables is improved from 22s to 1.5s.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# BigQuery connector
* Improve speed of fetching columns metadata. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label May 10, 2024
@github-actions github-actions bot added the bigquery BigQuery connector label May 10, 2024
@nineinchnick nineinchnick force-pushed the bigquery-bulk-columns branch 3 times, most recently from d3fba84 to f43ccd6 Compare May 12, 2024 08:24
@nineinchnick nineinchnick requested a review from wendigo May 12, 2024 09:28
@nineinchnick nineinchnick force-pushed the bigquery-bulk-columns branch 2 times, most recently from a2bbea8 to 3654c2e Compare May 16, 2024 10:16
@nineinchnick nineinchnick requested a review from ebyhr May 16, 2024 10:18
@nineinchnick
Copy link
Member Author

@ebyhr PTAL

FROM %s.INFORMATION_SCHEMA.COLUMNS
GROUP BY table_catalog, table_schema, table_name
""".formatted(quote(remoteSchemaName)));
String schemaName = client.toSchemaName(DatasetId.of(projectId, remoteSchemaName));
Copy link
Member

@ebyhr ebyhr May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to build DatasetId and call getDataset method (when is it different from remoteSchemaName)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it for consistency. This connector has a local to remote names mapping.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #19860

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery BigQuery connector cla-signed
Development

Successfully merging this pull request may close these issues.

None yet

3 participants