Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reason: 429 Received message larger than max (6341826 vs. 4194304) #78

Closed
dabasmoti opened this issue Oct 18, 2020 · 9 comments · Fixed by #79
Closed

Reason: 429 Received message larger than max (6341826 vs. 4194304) #78

dabasmoti opened this issue Oct 18, 2020 · 9 comments · Fixed by #79
Assignees
Labels
api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API. priority: p0 Highest priority. Critical issue. P0 implies highest priority. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@dabasmoti
Copy link

Recently ,When quering bigquery table(1m,768) using pandas_gbq use_bqstorage_api=True
error has raised

Environment details

  • OS type and version: Centos 7
  • Python version: 3.7
  • pip version: Both (10.0.1 ,20.2.3 )
  • google-cloud-bigquery version: 2.1.0
cachetools==4.1.1
certifi==2020.6.20
cffi==1.14.3
chardet==3.0.4
google-api-core==1.22.4
google-auth==1.22.1
google-auth-oauthlib==0.4.1
google-cloud-bigquery==2.1.0
google-cloud-bigquery-storage==2.0.0
google-cloud-core==1.4.3
google-crc32c==1.0.0
google-resumable-media==1.1.0
googleapis-common-protos==1.52.0
grpcio==1.32.0
idna==2.10
libcst==0.3.13
mypy-extensions==0.4.3
numpy==1.19.2
oauthlib==3.1.0
pandas==1.1.3
pandas-gbq==0.14.0
proto-plus==1.10.2
protobuf==3.13.0
pyarrow==1.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pydata-google-auth==1.1.0
python-dateutil==2.8.1
pytz==2020.1
PyYAML==5.3.1
requests==2.24.0
requests-oauthlib==1.3.0
rsa==4.6
six==1.15.0
typing-extensions==3.7.4.3
typing-inspect==0.6.0
urllib3==1.25.10

Steps to reproduce

import pandas as pd
import pandas_gbq
 
query = " select * from table"
df= pandas_gbq.read_gbq(query ,use_bqstorage_api=True)?

Stack trace


  create_bqstorage_client=create_bqstorage_client,
Traceback (most recent call last):
  File "/root/test3.7/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 150, in error_remapped_callable
    return _StreamingResponseIterator(result, prefetch_first_result=prefetch_first)
  File "/root/test3.7/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 73, in __init__
    self._stored_first_result = six.next(self._wrapped)
  File "/root/test3.7/lib/python3.7/site-packages/grpc/_channel.py", line 416, in __next__
    return self._next()
  File "/root/test3.7/lib/python3.7/site-packages/grpc/_channel.py", line 706, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.RESOURCE_EXHAUSTED
        details = "Received message larger than max (6349834 vs. 4194304)"
        debug_error_string = "{"created":"@1603030001.808209349","description":"Received message larger than max (6349834 vs. 4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":207,"grpc_status":8}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/test3.7/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 590, in _download_results
    **to_dataframe_kwargs
  File "/root/test3.7/lib/python3.7/site-packages/google/cloud/bigquery/table.py", line 1690, in to_dataframe
    create_bqstorage_client=create_bqstorage_client,
  File "/root/test3.7/lib/python3.7/site-packages/google/cloud/bigquery/table.py", line 1508, in to_arrow
    bqstorage_client=bqstorage_client
  File "/root/test3.7/lib/python3.7/site-packages/google/cloud/bigquery/table.py", line 1397, in _to_page_iterable
    for item in bqstorage_download():
  File "/root/test3.7/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 676, in _download_table_bqstorage
    future.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/root/test3.7/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 568, in _download_table_bqstorage_stream
    rowstream = bqstorage_client.read_rows(stream.name).rows(session)
  File "/root/test3.7/lib/python3.7/site-packages/google/cloud/bigquery_storage_v1/client.py", line 129, in read_rows
    metadata=metadata,
  File "/root/test3.7/lib/python3.7/site-packages/google/cloud/bigquery_storage_v1/services/big_query_read/client.py", line 498, in read_rows
    response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
  File "/root/test3.7/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
    return wrapped_func(*args, **kwargs)
  File "/root/test3.7/lib/python3.7/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
    on_error=on_error,
  File "/root/test3.7/lib/python3.7/site-packages/google/api_core/retry.py", line 184, in retry_target
    return target()
  File "/root/test3.7/lib/python3.7/site-packages/google/api_core/timeout.py", line 102, in func_with_timeout
    return func(*args, **kwargs)
  File "/root/test3.7/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 152, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.ResourceExhausted: 429 Received message larger than max (6349834 vs. 4194304)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "bq_test.py", line 11, in <module>
    df= pandas_gbq.read_gbq(query ,use_bqstorage_api=True,credentials=credentials)
  File "/root/test3.7/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 975, in read_gbq
    dtypes=dtypes,
  File "/root/test3.7/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 536, in run_query
    user_dtypes=dtypes,
  File "/root/test3.7/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 593, in _download_results
    self.process_http_error(ex)
  File "/root/test3.7/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 435, in process_http_error
    raise GenericGBQException("Reason: {0}".format(ex))
pandas_gbq.gbq.GenericGBQException: Reason: 429 Received message larger than max (6349834 vs. 4194304)

@sm-hawkfish
Copy link

We recently came across this issue as well when running a query with shape (1 million rows, 1700 columns), and were able to work around it by downgrading from:

google-cloud-bigquery==2.1.0
google-cloud-bigquery-storage==2.0.0

to

google-cloud-bigquery==1.28.0
google-cloud-bigquery-storage==1.1.0

Happy to provide other details on the error and/or our runtime environment, if they would be useful.

@dabasmoti
Copy link
Author

@sm-hawkfish
Thank you for your replay.
I will be glad if you share with me more details about the other dependencies.
Thanks

@sm-hawkfish
Copy link

Hi @dabasmoti -- I intended that offer to be for the maintainers of this repo, who might want more information to aid their debugging. Did downgrading the two packages I listed not solve the issue for you as well ?

@dabasmoti
Copy link
Author

@sm-hawkfish
Hey,
I wasn't sure about the google-api-core version. So, I asked you about the other dependencies.
Your advice solved the error, Thanks!

@sm-hawkfish
Copy link

Hi @dabasmoti -- I think you should re-open this issue. You were right to report it as downgrading the versions of the library is just a short-term workaround.

@dabasmoti dabasmoti reopened this Oct 19, 2020
@tswast tswast transferred this issue from googleapis/python-bigquery Oct 19, 2020
@product-auto-label product-auto-label bot added the api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API. label Oct 19, 2020
@tswast tswast added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p0 Highest priority. Critical issue. P0 implies highest priority. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Oct 19, 2020
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Oct 19, 2020
@tswast tswast self-assigned this Oct 19, 2020
@tswast
Copy link
Contributor

tswast commented Oct 19, 2020

In version 1.x, the grpc-max_receive_message_length was set to -1.

https://github.com/googleapis/python-bigquery-storage/blob/v1.1.0/google/cloud/bigquery_storage_v1/gapic/transports/big_query_read_grpc_transport.py#L68-L74

Currently investigating if a similar option is being set in the version 2.0 client.

@tswast
Copy link
Contributor

tswast commented Oct 19, 2020

Filed googleapis/gapic-generator-python#669 to fix this in the code generator

@tswast
Copy link
Contributor

tswast commented Oct 19, 2020

Sent #79 with a (temporary?) fix to remove this client-side validation in this client.

Also adds a system test that reliably reproduces this issue with a public dataset without this fix.

    read_session = types.ReadSession()
read_session.table = "projects/{}/datasets/{}/tables/{}".format(
    "bigquery-public-data", "geo_census_tracts", "us_census_tracts_national"
)
read_session.data_format = types.DataFormat.ARROW

session = client.create_read_session(
    request={
        "parent": "projects/{}".format(project_id),
        "read_session": read_session,
        "max_stream_count": 1,
    }
)

stream = session.streams[0].name

read_rows_stream = client.read_rows(stream)

# fetch the first two batches of rows
pages_iter = iter(read_rows_stream.rows(session).pages)
some_rows = next(pages_iter)

assert all(len(row["tract_geom"].as_py()) > 0 for row in some_rows)

@tswast
Copy link
Contributor

tswast commented Oct 21, 2020

This fix is released in google-cloud-bigquery-storage==2.0.1 on both PyPI and Conda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API. priority: p0 Highest priority. Critical issue. P0 implies highest priority. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants