Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RowIterator to_dataframe requires pyarrow >= 1.0.0 to work #249

Closed
BradLewis opened this issue Aug 27, 2020 · 3 comments · Fixed by #250
Closed

RowIterator to_dataframe requires pyarrow >= 1.0.0 to work #249

BradLewis opened this issue Aug 27, 2020 · 3 comments · Fixed by #250
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@BradLewis
Copy link

Currently the google-cloud-bigquery library requires pyarrow > 0.16.0, however the method RowIterator.to_dataframe adds the kwarg "timestamp_as_object", which is only supported in pyarrow >= 1.0.0. If install pyarrow >= 1.0.0, everything works as expected, however we are using other libraries which require pyarrow < 1.0.0.

So the requirements should either be updated to require pyarrow >= 1.0.0, or backported to support versions less than 1.

Environment details

  • OS type and version: Any
  • Python version: 3.6.9
  • pip version: 20.2.2
  • google-cloud-bigquery version: 1.27.2

Steps to reproduce

  1. Use pyarrow < 1.0.0
  2. Run RowIterator to_dataframe

Stack trace

#     result = future.result()
  File "<path>/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "<path>/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "<path>/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "<path>", line 133, in run_query
    bqstorage_client=client_storage
  File "<path>/python3.6/site-packages/google/cloud/bigquery/table.py", line 1757, in to_dataframe
    df = record_batch.to_pandas(date_as_object=date_as_object, **extra_kwargs)
  File "pyarrow/array.pxi", line 503, in pyarrow.lib._PandasConvertible.to_pandas
TypeError: to_pandas() got an unexpected keyword argument 'timestamp_as_object'
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Aug 27, 2020
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Aug 28, 2020
@HemangChothani HemangChothani added type: question Request for information or clarification. Not an issue. and removed triage me I really want to be triaged. labels Aug 28, 2020
@HemangChothani
Copy link
Contributor

@BradLewis Minimum required version is specified in extras_require in setup.py file

python-bigquery/setup.py

Lines 50 to 53 in a587de4

"pyarrow": [
"pyarrow >= 1.0.0, < 2.0dev; python_version >= '3.5'",
# Pyarrow >= 0.17.0 is not compatible with Python 2 anymore.
"pyarrow < 0.17.0; python_version < '3.0' and platform_system != 'Windows'",

@BradLewis
Copy link
Author

BradLewis commented Aug 28, 2020

@HemangChothani That is only the case when using the pyarrow extra.
For more context, we are using the google-cloud-bigquery library with the google-cloud-bigquery-storage library with pandas and fastavro, similar to what is documented here: https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas.
In the extra section for bqstorage the required pyarrow is >= 0.16.0.
For example, with the below code and running pyarrow < 1.0.0,

dataframe = (
    bqclient.query(query_string)
    .result()
    .to_dataframe(bqstorage_client=bqstorageclient)
)

this will give the same error

TypeError: to_pandas() got an unexpected keyword argument 'timestamp_as_object'

This is the PR that introduced this #209.
As it mentions it requires pyarrow >= 1.0.0, however if you don't install the pyarrow extra with the intention of not using it, and have another dependency that requires pyarrow < 1.0.0, it'll run through this path and give you the error.

@HemangChothani
Copy link
Contributor

@BradLewis Thanks for the explanation got the point, i have raised PR which bump the minimum requirement of pyarrow version for bigquery-storage as well.

@HemangChothani HemangChothani self-assigned this Aug 28, 2020
@HemangChothani HemangChothani added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed type: question Request for information or clarification. Not an issue. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Aug 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants