New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: handle consuming streams with no data #29
Conversation
""" | ||
result = collections.OrderedDict() | ||
|
||
type_map = {"long": "int64", "double": "float64", "boolean": "bool"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine for this change, but I wonder if we should consider consolidating all our various type/schema conversion code in the storage library and bigquery. Is there demand outside of our own usages (e.g. in other storage APIs) that we should consider moving this into a more central dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in some libs working closely with the BigQuery API? cc: @tswast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have similar mappings in pandas-gbq and maybe ibis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to have a look at it in the near future. It might be worth to wait for all the pending fixes and dependency version updates, though, to see how of that "lipstick" logic for types is still needed.
🤖 I have created a release \*beep\* \*boop\* --- ## [1.0.0](https://www.github.com/googleapis/python-bigquery-storage/compare/v0.8.0...v1.0.0) (2020-06-04) ### Bug Fixes * handle consuming streams with no data ([#29](https://www.github.com/googleapis/python-bigquery-storage/issues/29)) ([56d1b1f](https://www.github.com/googleapis/python-bigquery-storage/commit/56d1b1fd75965669f5a4d10e5b00671c276eda88)) * update pyarrow references that are warning ([#31](https://www.github.com/googleapis/python-bigquery-storage/issues/31)) ([5302481](https://www.github.com/googleapis/python-bigquery-storage/commit/5302481d9f0ee07630ae62ed7268e510bcaa5d84)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please).
Fixes #27.
This PR fixes the issue with consuming streams with no data. If an empty stream is encountered, the
to_dataframe()
/to_arrow()
method returns an empty DataFrame / arrow Table.The schema of the empty result is preserved (on a best-effort basis) and is consistent regardless of the chosen session data format.
How to reproduce
Run a query and fetch its results in an AVRO/ARROW session with multiple requested streams. The query results should be large enough so that the backend indeed decides to create multiple streams.
Additionally, the session should have a very tight
row_restriction
filter applied so that only a few rows actually get streamed to the client. If "lucky", at least one of the streams will contain no data and will result in an error when reading from it.Things to discuss
v1beta1
client? I presume not?PR checklist