fix: handle consuming streams with no data #29

plamut · 2020-05-28T13:26:24Z

Fixes #27.

This PR fixes the issue with consuming streams with no data. If an empty stream is encountered, the to_dataframe() / to_arrow() method returns an empty DataFrame / arrow Table.

The schema of the empty result is preserved (on a best-effort basis) and is consistent regardless of the chosen session data format.

How to reproduce

Run a query and fetch its results in an AVRO/ARROW session with multiple requested streams. The query results should be large enough so that the backend indeed decides to create multiple streams.

Additionally, the session should have a very tight row_restriction filter applied so that only a few rows actually get streamed to the client. If "lucky", at least one of the streams will contain no data and will result in an error when reading from it.

Things to discuss

Do we need to backport this fix to v1beta1 client? I presume not?

PR checklist

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

shollyman · 2020-06-02T16:53:21Z

google/cloud/bigquery_storage_v1/reader.py

+        """
+        result = collections.OrderedDict()
+
+        type_map = {"long": "int64", "double": "float64", "boolean": "bool"}


I think this is fine for this change, but I wonder if we should consider consolidating all our various type/schema conversion code in the storage library and bigquery. Is there demand outside of our own usages (e.g. in other storage APIs) that we should consider moving this into a more central dependency?

Maybe in some libs working closely with the BigQuery API? cc: @tswast

We have similar mappings in pandas-gbq and maybe ibis.

https://github.com/pydata/pandas-gbq/blob/d251db03b159447331ac9ae63e13d295d75bad70/pandas_gbq/schema.py#L71

I'll try to have a look at it in the near future. It might be worth to wait for all the pending fixes and dependency version updates, though, to see how of that "lipstick" logic for types is still needed.

🤖 I have created a release \*beep\* \*boop\* --- ## [1.0.0](https://www.github.com/googleapis/python-bigquery-storage/compare/v0.8.0...v1.0.0) (2020-06-04) ### Bug Fixes * handle consuming streams with no data ([#29](https://www.github.com/googleapis/python-bigquery-storage/issues/29)) ([56d1b1f](https://www.github.com/googleapis/python-bigquery-storage/commit/56d1b1fd75965669f5a4d10e5b00671c276eda88)) * update pyarrow references that are warning ([#31](https://www.github.com/googleapis/python-bigquery-storage/issues/31)) ([5302481](https://www.github.com/googleapis/python-bigquery-storage/commit/5302481d9f0ee07630ae62ed7268e510bcaa5d84)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please).

plamut requested a review from shollyman May 28, 2020 13:26

googlebot added the cla: yes This human has signed the Contributor License Agreement. label May 28, 2020

fix: handle consuming streams with no data

42c9fc9

plamut force-pushed the iss-27 branch from 7bf59fc to 42c9fc9 Compare May 28, 2020 13:50

shollyman approved these changes Jun 2, 2020

View reviewed changes

plamut merged commit 56d1b1f into googleapis:master Jun 4, 2020

plamut deleted the iss-27 branch June 4, 2020 10:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle consuming streams with no data #29

fix: handle consuming streams with no data #29

plamut commented May 28, 2020 •

edited

shollyman Jun 2, 2020

plamut Jun 2, 2020

tswast Jun 2, 2020

plamut Jun 4, 2020

fix: handle consuming streams with no data #29

fix: handle consuming streams with no data #29

Conversation

plamut commented May 28, 2020 • edited

How to reproduce

Things to discuss

PR checklist

shollyman Jun 2, 2020

Choose a reason for hiding this comment

plamut Jun 2, 2020

Choose a reason for hiding this comment

tswast Jun 2, 2020

Choose a reason for hiding this comment

plamut Jun 4, 2020

Choose a reason for hiding this comment

plamut commented May 28, 2020 •

edited