Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add max_queue_size argument to RowIterator.to_dataframe_iterable #575

Merged
merged 5 commits into from Apr 14, 2021

Conversation

plamut
Copy link
Contributor

@plamut plamut commented Mar 24, 2021

Closes #561.

This PR limits the size of the internal queue that stores result pages when streaming data over the BQ Storage API. It also makes the limit configurable.

Still need to add a few additional unit tests, but that should be it.

Note:
The new parameter is not exposed to the bigquery Jupyter cell magic - I presume that's fine? I don't think cell magic needs such fine-grained control, since it's not really meant to fetch huge query results into a Jupyter notebook session where any performance difference could actually matter.

PR checklist:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Mar 24, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Mar 24, 2021
@plamut plamut force-pushed the iss-561 branch 2 times, most recently from 30ffe85 to 680f952 Compare March 24, 2021 16:36
@plamut plamut requested a review from tswast March 24, 2021 16:36
@plamut plamut added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 24, 2021
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 24, 2021
@plamut plamut changed the title feat: add configurable max size for the queue holding the result pages streamed over the BQ Stroage API feat: add configurable max size for the queue holding the result pages streamed over the BQ Storage API Mar 29, 2021
The new parameter allows configuring the maximum size of the internal
queue used to hold result pages when query data is streamed over the
BigQuery Storage API.
@plamut plamut marked this pull request as ready for review March 29, 2021 13:52
@plamut plamut requested a review from a team March 29, 2021 13:52
@plamut plamut added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 30, 2021
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 30, 2021
@tswast
Copy link
Contributor

tswast commented Mar 30, 2021

Looks like some tests are timing out now. I suspect that 1 is not the right default

@tswast
Copy link
Contributor

tswast commented Mar 30, 2021

How about we only add the argument to to_dataframe_iterable, as that is where it is most relevant. I think None or maybe = to number of workers is probably the right default.

In the other methods we are expected to download the whole table/query results at once anyway, so conserving memory isn't as important.

@plamut
Copy link
Contributor Author

plamut commented Mar 30, 2021

I'm fine with that, I'll remove the parameter from other methods where it's expected that query results are downloaded in full. Will also check the timeouts and what a better default could be.

Comment on lines +1654 to +1656
By default, the max queue size is set to the number of BQ Storage streams
created by the server. If ``max_queue_size`` is :data:`None`, the queue
size is infinite.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case somebody really wants the old behavior, I added it as an option.

@plamut
Copy link
Contributor Author

plamut commented Apr 14, 2021

@tswast ping :)

Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful!

@tswast tswast merged commit f95f415 into googleapis:master Apr 14, 2021
@tswast tswast changed the title feat: add configurable max size for the queue holding the result pages streamed over the BQ Storage API feat: add max_queue_size argument to RowIterator.to_dataframe_iterable Apr 14, 2021
@plamut plamut deleted the iss-561 branch April 14, 2021 21:13
gcf-merge-on-green bot pushed a commit that referenced this pull request Apr 26, 2021
🤖 I have created a release \*beep\* \*boop\*
---
## [2.14.0](https://www.github.com/googleapis/python-bigquery/compare/v2.13.1...v2.14.0) (2021-04-26)


### Features

* accept DatasetListItem where DatasetReference is accepted ([#597](https://www.github.com/googleapis/python-bigquery/issues/597)) ([c8b5581](https://www.github.com/googleapis/python-bigquery/commit/c8b5581ea3c94005d69755c4a3b5a0d8900f3fe2))
* accept job object as argument to `get_job` and `cancel_job` ([#617](https://www.github.com/googleapis/python-bigquery/issues/617)) ([f75dcdf](https://www.github.com/googleapis/python-bigquery/commit/f75dcdf3943b87daba60011c9a3b42e34ff81910))
* add `Client.delete_job_metadata` method to remove job metadata ([#610](https://www.github.com/googleapis/python-bigquery/issues/610)) ([0abb566](https://www.github.com/googleapis/python-bigquery/commit/0abb56669c097c59fbffce007c702e7a55f2d9c1))
* add `max_queue_size` argument to `RowIterator.to_dataframe_iterable` ([#575](https://www.github.com/googleapis/python-bigquery/issues/575)) ([f95f415](https://www.github.com/googleapis/python-bigquery/commit/f95f415d3441b3928f6cc705cb8a75603d790fd6))
* add type hints for public methods ([#613](https://www.github.com/googleapis/python-bigquery/issues/613)) ([f8d4aaa](https://www.github.com/googleapis/python-bigquery/commit/f8d4aaa335a0eef915e73596fc9b43b11d11be9f))
* DB API cursors are now iterable ([#618](https://www.github.com/googleapis/python-bigquery/issues/618)) ([e0b373d](https://www.github.com/googleapis/python-bigquery/commit/e0b373d0e721a70656ed8faceb7f5c70f642d144))
* retry google.auth TransportError by default ([#624](https://www.github.com/googleapis/python-bigquery/issues/624)) ([34ecc3f](https://www.github.com/googleapis/python-bigquery/commit/34ecc3f1ca0ff073330c0c605673d89b43af7ed9))
* use pyarrow stream compression, if available ([#593](https://www.github.com/googleapis/python-bigquery/issues/593)) ([dde9dc5](https://www.github.com/googleapis/python-bigquery/commit/dde9dc5114c2311fb76fafc5b222fff561e8abf1))


### Bug Fixes

* consistent percents handling in DB API query ([#619](https://www.github.com/googleapis/python-bigquery/issues/619)) ([6502a60](https://www.github.com/googleapis/python-bigquery/commit/6502a602337ae562652a20b20270949f2c9d5073))
* missing license headers in new test files ([#604](https://www.github.com/googleapis/python-bigquery/issues/604)) ([df48cc5](https://www.github.com/googleapis/python-bigquery/commit/df48cc5a0be99ad39d5835652d1b7422209afc5d))
* unsetting clustering fileds on Table is now possible ([#622](https://www.github.com/googleapis/python-bigquery/issues/622)) ([33a871f](https://www.github.com/googleapis/python-bigquery/commit/33a871f06329f9bf5a6a92fab9ead65bf2bee75d))


### Documentation

* add sample to run DML query ([#591](https://www.github.com/googleapis/python-bigquery/issues/591)) ([ff2ec3a](https://www.github.com/googleapis/python-bigquery/commit/ff2ec3abe418a443cd07751c08e654f94e8b3155))
* update the description of the return value of `_QueryResults.rows()` ([#594](https://www.github.com/googleapis/python-bigquery/issues/594)) ([8f4c0b8](https://www.github.com/googleapis/python-bigquery/commit/8f4c0b84dac3840532d7865247b8ad94b625b897))
---


This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
3 participants