Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handle consuming streams with no data #29

Merged
merged 1 commit into from Jun 4, 2020

Conversation

plamut
Copy link
Contributor

@plamut plamut commented May 28, 2020

Fixes #27.

This PR fixes the issue with consuming streams with no data. If an empty stream is encountered, the to_dataframe() / to_arrow() method returns an empty DataFrame / arrow Table.

The schema of the empty result is preserved (on a best-effort basis) and is consistent regardless of the chosen session data format.

How to reproduce

Run a query and fetch its results in an AVRO/ARROW session with multiple requested streams. The query results should be large enough so that the backend indeed decides to create multiple streams.

Additionally, the session should have a very tight row_restriction filter applied so that only a few rows actually get streamed to the client. If "lucky", at least one of the streams will contain no data and will result in an error when reading from it.

Things to discuss

  • Do we need to backport this fix to v1beta1 client? I presume not?

PR checklist

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

@plamut plamut requested a review from shollyman May 28, 2020 13:26
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label May 28, 2020
"""
result = collections.OrderedDict()

type_map = {"long": "int64", "double": "float64", "boolean": "bool"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for this change, but I wonder if we should consider consolidating all our various type/schema conversion code in the storage library and bigquery. Is there demand outside of our own usages (e.g. in other storage APIs) that we should consider moving this into a more central dependency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in some libs working closely with the BigQuery API? cc: @tswast

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to have a look at it in the near future. It might be worth to wait for all the pending fixes and dependency version updates, though, to see how of that "lipstick" logic for types is still needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ValueError: No objects to concatenate in to_dataframe method
4 participants