Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ensure Arrow type metadata is consistent between REST and BQ Storage APIs #894

Closed
tswast opened this issue Aug 23, 2021 · 4 comments · Fixed by #946
Closed

feat: ensure Arrow type metadata is consistent between REST and BQ Storage APIs #894

tswast opened this issue Aug 23, 2021 · 4 comments · Fixed by #946
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@tswast
Copy link
Contributor

tswast commented Aug 23, 2021

The BQ Storage API annotates certain types with additional information so that they can be disambiguated. For example, to tell GEOGRAPHY from STRING columns.

We don't currently annotate the types we create when data is initially downloaded from the REST API.

If we make these consistent, the user could register Arrow extension types, which can hook into the default pyarrow -> pandas conversions. See discussion on: #848 (comment)

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Aug 23, 2021
@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Aug 23, 2021
@jimfulton jimfulton self-assigned this Sep 1, 2021
@jimfulton
Copy link
Contributor

I have questions.

  • Confirming, this needs to be done on the client, right?

  • Do we know what extension metadata are added by the BigQuery storage API?

    Is this documented somewhere? Or do we need to determine it empirically?

@jimfulton
Copy link
Contributor

It would be cool if you could attach application-specific metadata to BigQuery columns. :) You could, for example mark a struct as defining a timedelta. :)

@jimfulton
Copy link
Contributor

I know folks are busy, so I'm going to assume:

* Confirming, this needs to be done on the client, right?

yes.

* Do we know what extension metadata are added by the BigQuery storage API?
  Is this documented somewhere? Or do we need to determine it empirically?

empirical.

@tswast
Copy link
Contributor Author

tswast commented Sep 3, 2021

There are some docs here: https://cloud.google.com/bigquery/docs/reference/storage#arrow_schema_details

But I think empirical will best help us make sure the specific values.

I consider the metadata from the BQ Storage API to be the canonical version. We should update the "arrow from REST API" logic to match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
2 participants