feat: use BigQuery Storage client by default #55

plamut · 2020-03-09T19:22:59Z

Closes #86.
Closes #84.

~~No issue yet, just a POC preview.~~

The PR is now ready to be reviewed. It uses the BQ Storage API by default to fetch query results, and removes the "beta" label from that feature.

It's probably easier to review this PR commit by commit.

Key points:

If BQ Storage client cannot be used (e.g. missing dependencies) to fetch query results, the library falls back to the REST API.
The BQ storage API v1 stable is now used, replacing v1beta1 client (however, there is a feature request to support both BQ Storage client versions simultaneously).
The Python DB API now more closely follows the specs. Closing a connection instance makes it unusable from that point on, and also closes every Cursor instance created by it.
Closing a DB API Connection now also closes the BigQuery / BQ Storage clients the latter has created, fixing the sockets leak discovered.

plamut · 2020-04-07T12:17:25Z

I did some timings to compare fetching data with and without the BQ Storage client, results below. Spoiler: When using the BQ Storage API, performance improvements are enormous.

Query

SELECT * FROM `project.dataset.table`
LIMIT 200000

Source tables

A synthetic table with two FLOAT columns containing 10M rows of random numbers
bigquery-public-data-cms_medicare.inpatient_charges_2011 - multiple numeric and string columns, contains ~163k rows.

Timings

	REST API (tabledata.list)	BQ Storage API
Synthetic table	10 - 11.5 s	4.5 - 5 s
Inpatient charges table	23 - 25s	10.5 - 12.5 s

google/cloud/bigquery/_pandas_helpers.py

shollyman

Semantic satiation seeing all the v1/v1beta references. Sorry it took me so long to get to this. Couple minor nits, with a couple more interesting bits (avro vs arrow, small result set, versioning for storage).

google/cloud/bigquery/dbapi/cursor.py

google/cloud/bigquery/job.py

setup.py

tests/system.py

* feat: add HOUR support for time partitioning interval

shollyman

Thanks for the slogging through this.

plamut · 2020-05-15T05:12:54Z

Just a thought - since this is a pretty significant change, how about releasing all the other smaller changes and fixes that have accumulated since the last release separately, and release this one separately for an easier rollback, should that be necessary?

shollyman · 2020-05-15T17:39:24Z

Isolating this change to its own release seems prudent.

plamut · 2020-05-19T07:23:52Z

Putting this on hold until the next release is made, we will ship it separately.

plamut · 2020-06-10T15:04:41Z

With the new release (1.25.0) now out, we can now merge this and release in the near future (possibly with miscellaneous related cleanup fixes).

plamut added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Mar 9, 2020

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Mar 9, 2020

plamut force-pushed the optimize-to-dataframe branch from 25572ab to 1bb7283 Compare March 11, 2020 13:27

plamut force-pushed the optimize-to-dataframe branch from fde9441 to c01648d Compare April 14, 2020 13:28

plamut force-pushed the optimize-to-dataframe branch 4 times, most recently from f080813 to 1aa0c62 Compare April 28, 2020 07:59

plamut removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Apr 28, 2020

plamut mentioned this pull request Apr 28, 2020

BigQuery Storage API integrations should accept either v1beta1 or v1 client #84

Closed

plamut requested a review from shollyman April 28, 2020 08:18

plamut marked this pull request as ready for review April 28, 2020 08:29

plamut added 4 commits April 28, 2020 10:37

feat: use BigQuery Storage client by default

d3d8525

Use BQ Storage API by default in cell magic

6e678ce

Add raise_on_closed helper decorator to DB API

d544a69

Use BigQuery Storage API by default in DB API

2cd32d3

plamut force-pushed the optimize-to-dataframe branch from 1aa0c62 to 3f73a43 Compare April 28, 2020 08:37

plamut added 3 commits April 28, 2020 14:21

Use BQ Storage v1 stable version in main client

3b1a52a

Use BQ Storage v1 stable in BigQuery cell magic

384b5d6

Use BQ Storage v1 stable in DB API

00bf584

plamut force-pushed the optimize-to-dataframe branch from 3f73a43 to 00bf584 Compare April 28, 2020 12:22

tswast reviewed Apr 28, 2020

View reviewed changes

google/cloud/bigquery/_pandas_helpers.py Outdated Show resolved Hide resolved

plamut mentioned this pull request Apr 30, 2020

chore: remove local test_utils directory #89

Merged

4 tasks

plamut and others added 2 commits April 30, 2020 10:16

Merge branch 'master' into optimize-to-dataframe

7d7dc73

Support both v1 stable and beta1 BQ Storage client

dbef14d

plamut force-pushed the optimize-to-dataframe branch from 12b2171 to dbef14d Compare April 30, 2020 16:24

shollyman reviewed May 11, 2020

View reviewed changes

plamut added 2 commits May 13, 2020 13:46

Merge branch 'master' into optimize-to-dataframe

4abd932

Fix some typos and redundant Beta mark

530a5db

plamut mentioned this pull request May 13, 2020

Remove workarounds for the BQ Storage API issue with small result sets #106

Closed

plamut force-pushed the optimize-to-dataframe branch 2 times, most recently from 5b82c76 to c715a74 Compare May 14, 2020 13:27

plamut requested a review from shollyman May 14, 2020 14:10

plamut and others added 2 commits May 14, 2020 17:07

Use ARROW as data format in DB API cursor

c6e0133

feat: add HOUR support for time partitioning interval (googleapis#91)

6e35d09

* feat: add HOUR support for time partitioning interval

plamut force-pushed the optimize-to-dataframe branch from 486309f to 6e35d09 Compare May 14, 2020 15:08

Merge branch 'master' into optimize-to-dataframe

1c75225

shollyman approved these changes May 15, 2020

View reviewed changes

plamut added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label May 19, 2020

plamut mentioned this pull request May 19, 2020

Pagination should allow fetching pages in batch of N pages. #112

Closed

shollyman mentioned this pull request Jun 9, 2020

doc: update wording in rowiterator docstrings to reduce confusion #127

Merged

plamut added 2 commits June 10, 2020 17:00

Merge branch 'master' into optimize-to-dataframe

bd9b7c8

Bump BQ storage pin to stable version.

7294605

plamut removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jun 10, 2020

plamut requested a review from shollyman June 10, 2020 15:02

shollyman approved these changes Jun 10, 2020

View reviewed changes

plamut merged commit e75ff82 into googleapis:master Jun 10, 2020

plamut deleted the optimize-to-dataframe branch June 10, 2020 21:20

tswast mentioned this pull request Jun 26, 2020

ISSUE-41: Support storage api googleapis/python-bigquery-sqlalchemy#61

Closed

derekperkins mentioned this pull request Apr 1, 2021

bigquery: enable transparent storage api usage googleapis/google-cloud-go#3880

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use BigQuery Storage client by default #55

feat: use BigQuery Storage client by default #55

plamut commented Mar 9, 2020 •

edited

plamut commented Apr 7, 2020

shollyman left a comment

shollyman left a comment

plamut commented May 15, 2020

shollyman commented May 15, 2020

plamut commented May 19, 2020 •

edited

plamut commented Jun 10, 2020

Navigation Menu

feat: use BigQuery Storage client by default #55

feat: use BigQuery Storage client by default #55

Conversation

plamut commented Mar 9, 2020 • edited

Key points:

plamut commented Apr 7, 2020

Query

Source tables

Timings

shollyman left a comment

Choose a reason for hiding this comment

shollyman left a comment

Choose a reason for hiding this comment

plamut commented May 15, 2020

shollyman commented May 15, 2020

plamut commented May 19, 2020 • edited

plamut commented Jun 10, 2020

plamut commented Mar 9, 2020 •

edited

plamut commented May 19, 2020 •

edited