Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TQDM progress bar causes ReadTimeout error #438

Closed
willbowditch opened this issue Dec 15, 2020 · 4 comments · Fixed by #444
Closed

TQDM progress bar causes ReadTimeout error #438

willbowditch opened this issue Dec 15, 2020 · 4 comments · Fixed by #444
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@willbowditch
Copy link

Using 2.6.1 and enabling the tqdm progress bar for some queries, when not using the bigquery-storage option, results in a timeout error:

urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='bigquery.googleapis.com', port=443): Read timed out. (read timeout=0.5)

Maybe related to #403?

Disabling the tqdm progress bar and running the same query returns a data frame without any problem.

For reproducible example see previous bug report #403

Environment details

  • OS type and version: Debian Buster
  • Python version: 3.8
  • pip version: 20.3.1
  • google-cloud-bigquery version: 3.6.1

Trace:

    df = query_job.to_dataframe(
  File "/usr/local/lib/python3.8/site-packages/google/cloud/bigquery/job/query.py", line 1329, in to_dataframe
    return query_result.to_dataframe(
  File "/usr/local/lib/python3.8/site-packages/google/cloud/bigquery/table.py", line 1769, in to_dataframe
    record_batch = self.to_arrow(
  File "/usr/local/lib/python3.8/site-packages/google/cloud/bigquery/table.py", line 1601, in to_arrow
    for record_batch in self._to_arrow_iterable(
  File "/usr/local/lib/python3.8/site-packages/google/cloud/bigquery/table.py", line 1500, in _to_page_iterable
    for item in tabledata_list_download():
  File "/usr/local/lib/python3.8/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 512, in download_arrow_row_iterator
    for page in pages:
  File "/usr/local/lib/python3.8/site-packages/google/api_core/page_iterator.py", line 243, in _page_iter
    page = self._next_page()
  File "/usr/local/lib/python3.8/site-packages/google/api_core/page_iterator.py", line 369, in _next_page
    response = self._get_next_page_response()
  File "/usr/local/lib/python3.8/site-packages/google/cloud/bigquery/table.py", line 1474, in _get_next_page_response
    return self.api_request(
  File "/usr/local/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 637, in _call_api
    return call()
  File "/usr/local/lib/python3.8/site-packages/google/api_core/retry.py", line 281, in retry_wrapped_func
    return retry_target(
  File "/usr/local/lib/python3.8/site-packages/google/api_core/retry.py", line 184, in retry_target
    return target()
  File "/usr/local/lib/python3.8/site-packages/google/cloud/_http.py", line 427, in api_request
    response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/google/cloud/_http.py", line 291, in _make_request
    return self._do_request(
  File "/usr/local/lib/python3.8/site-packages/google/cloud/_http.py", line 329, in _do_request
    return self.http.request(
  File "/usr/local/lib/python3.8/site-packages/google/auth/transport/requests.py", line 464, in request
    response = super(AuthorizedSession, self).request(
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='bigquery.googleapis.com', port=443): Read timed out. (read timeout=0.5)
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Dec 15, 2020
@willbowditch
Copy link
Author

Dropping to version 2.3.1 also fixes, whilst still having the progress bar enabled

@meredithslota meredithslota added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Dec 15, 2020
@tswast
Copy link
Contributor

tswast commented Dec 17, 2020

I'm not actually seeing any timeouts in the relevant class https://github.com/googleapis/python-bigquery/blob/master/google/cloud/bigquery/table.py#L1346 so it could actually be a problem in the core library.

In my benchmarks getQueryResults calls can take 10+ seconds for wide rows, so if there is a default timeout being set, it should be quite high.

@willbowditch
Copy link
Author

willbowditch commented Dec 18, 2020

@tswast I think it's this

_PROGRESS_BAR_UPDATE_INTERVAL = 0.5

Patching as follows fixes in 2.6.1

from google.cloud import bigquery

bigquery._tqdm_helpers._PROGRESS_BAR_UPDATE_INTERVAL = 10

@tswast
Copy link
Contributor

tswast commented Dec 21, 2020

Rather than increase the time between progress bar updates, I've sent #444 which will increase the minimum connection timeout to one that should accomodate 99.9%+ of response times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants