Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed when listing columns in BigQuery #21920

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

nineinchnick
Copy link
Member

Description

A continuation of #21830. Test coverage of the type parser is at 99% of lines and 95% of branches. Original description below.

Using mini parser because BigQuery Java SDK doesn't support translating string to BigQuery type as far as I asked Google engineers.

This PR improves listing columns. Test with 169 tables is improved from 22s to 1.5s.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# BigQuery connector
* Improve speed of fetching columns metadata. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label May 10, 2024
@github-actions github-actions bot added the bigquery BigQuery connector label May 10, 2024
protected TypeInfo() {}

@Override
public abstract String toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these methods are not needed, remove toString, equals and hashCode. toString is only used in test but that's redundant - we only need a single roundtrip: String -> TypeInfo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toString is used in a few exceptions. I'll remove others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's the exceptions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the IllegalArgumentException will be suppressed in convertToTrinoType, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I can either remove it, or we could add some logging, to make it easier to debug why some columns are not available in Trino. LMK what's better.

@nineinchnick nineinchnick force-pushed the bigquery-bulk-columns branch 3 times, most recently from d3fba84 to f43ccd6 Compare May 12, 2024 08:24
@nineinchnick nineinchnick requested a review from wendigo May 12, 2024 09:28
@nineinchnick nineinchnick force-pushed the bigquery-bulk-columns branch 2 times, most recently from a2bbea8 to 3654c2e Compare May 16, 2024 10:16
@nineinchnick nineinchnick requested a review from ebyhr May 16, 2024 10:18
@nineinchnick
Copy link
Member Author

@ebyhr PTAL

FROM %s.INFORMATION_SCHEMA.COLUMNS
GROUP BY table_catalog, table_schema, table_name
""".formatted(quote(remoteSchemaName)));
String schemaName = client.toSchemaName(DatasetId.of(projectId, remoteSchemaName));
Copy link
Member

@ebyhr ebyhr May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to build DatasetId and call getDataset method (when is it different from remoteSchemaName)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it for consistency. This connector has a local to remote names mapping.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #19860

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it's for extension. Such change should go to the internal repository.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we'd have to revert #19860. It's out of the scope of this PR.

@nineinchnick nineinchnick force-pushed the bigquery-bulk-columns branch 2 times, most recently from cc6a799 to bf0cf78 Compare May 21, 2024 10:25
@findinpath findinpath requested a review from pajaks May 21, 2024 21:06
@ebyhr
Copy link
Member

ebyhr commented May 24, 2024

/test-with-secrets sha=bf0cf785188ed0cca97f6b18f4f750df035ebe75

Copy link

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/9218809344

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good to me.

FROM %s.INFORMATION_SCHEMA.COLUMNS
GROUP BY table_catalog, table_schema, table_name
""".formatted(quote(remoteSchemaName)));
String schemaName = client.toSchemaName(DatasetId.of(projectId, remoteSchemaName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it's for extension. Such change should go to the internal repository.

protected TypeInfo() {}

@Override
public abstract String toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the IllegalArgumentException will be suppressed in convertToTrinoType, right?

* Unsupported type exception occurs only for data types
* known to be not supported, like STRUCT, not all unknown tokens.
*/
public class UnsupportedTypeException
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name looks little confusing because STRUCT type is supported except for parsing. How about renaming to ParsingException or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not supported in the type package. ParsingException seems to be too broad, or I'd have to refactor the exception to use it instead of IllegalArgumentException and make the typeName parameter optional. I don't think it's worth doing.

Comment on lines +421 to +423
if (!e.getTypeName().equals(STRUCT) || table.isEmpty() || table.get().getDefinition().getSchema() == null) {
// ignore unsupported types
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to continue iteration even when table is empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need the table to map STRUCT types. We should continue iteration to convert other types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery BigQuery connector cla-signed
Development

Successfully merging this pull request may close these issues.

None yet

3 participants