Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pyarrow library not detected, so QueryJob.to_dataframe() doesn't work #376

Closed
robertlacok opened this issue Nov 9, 2020 · 6 comments
Closed
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: question Request for information or clarification. Not an issue.

Comments

@robertlacok
Copy link

robertlacok commented Nov 9, 2020

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please be sure to include as much information as possible:

Environment details

  • OS type and version: replicated on Debian Buster, and MacOS Catalina
  • Python version: python --version tried on 3.7.9 and 3.8.1
  • pip version: pip --version 20.2.3
  • google-cloud-bigquery version: pip show google-cloud-bigquery 2.3.1

Steps to reproduce

  1. Install google-cloud-bigquery==2.3.1
  2. Attempt to use to_dataframe() function on QueryJob, it throws an error that pyarrow library is not installed
  3. Installing pyarrow with pip install pyarrow does not resolve the problem

Some extra info:

  • Installing google-cloud-bigquery==2.1.0 onto a fresh system works fine
  • However, downgrading with pip install --upgrade google-cloud-bigquery==2.1.0 after having 2.3.1 installed doesn't resolve the problem.

Code example

I documented the steps here:
https://deepnote.com/publish/0c5c5788-aae0-407f-9aec-3716e3a62a1b

from google.cloud import bigquery
client = bigquery.Client()
query = 'SELECT "Connection successful"'
query_job = client.query(query)
df = query_job.to_dataframe()
df

Stack trace

----------------------------------------------------------------------
ValueError                           Traceback (most recent call last)
<ipython-input-4-a1571309d5f3> in <module>
      4 query = 'SELECT "Connection successful"'
      5 query_job = client.query(query)
----> 6 df = query_job.to_dataframe()
      7 df

~/.pyenv/versions/3.8.1/lib/python3.8/site-packages/google/cloud/bigquery/job/query.py in to_dataframe(self, bqstorage_client, dtypes, progress_bar_type, create_bqstorage_client, date_as_object)
   1306             ValueError: If the `pandas` library cannot be imported.
   1307         """
-> 1308         return self.result().to_dataframe(
   1309             bqstorage_client=bqstorage_client,
   1310             dtypes=dtypes,

~/.pyenv/versions/3.8.1/lib/python3.8/site-packages/google/cloud/bigquery/table.py in to_dataframe(self, bqstorage_client, dtypes, progress_bar_type, create_bqstorage_client, date_as_object)
   1690             bqstorage_client = None
   1691 
-> 1692         record_batch = self.to_arrow(
   1693             progress_bar_type=progress_bar_type,
   1694             bqstorage_client=bqstorage_client,

~/.pyenv/versions/3.8.1/lib/python3.8/site-packages/google/cloud/bigquery/table.py in to_arrow(self, progress_bar_type, bqstorage_client, create_bqstorage_client)
   1493         """
   1494         if pyarrow is None:
-> 1495             raise ValueError(_NO_PYARROW_ERROR)
   1496 
   1497         if (

ValueError: The pyarrow library is not installed, please install pyarrow to use the to_arrow() function.

Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Nov 9, 2020
@HemangChothani
Copy link
Contributor

@robertlacok This error is related to pyarrow missing library, Could you please share pip3 freeze.

For me it's working fine, following are my environment details:

google-cloud-bigquery==2.3.1
google-cloud-bigquery-storage==2.0.1
google-api-core==1.23.0
google-cloud-core==1.4.2
pyarrow==2.0.0

@HemangChothani HemangChothani added the type: question Request for information or clarification. Not an issue. label Nov 9, 2020
@robertlacok
Copy link
Author

Hey, @HemangChothani thanks for taking a look.

I understand that's what the error says, but installing pyarrow doesn't resolve this.

I can see pyarrow==2.0.0 in the output of pip freeze, yet still get the error - see this notebook: https://deepnote.com/publish/0c5c5788-aae0-407f-9aec-3716e3a62a1b

@HemangChothani
Copy link
Contributor

@robertlacok Could please try to restart the kernel and try to import again?

@robertlacok
Copy link
Author

I see, so that helped, thanks.

Curious though:

  • In previous versions, this wasn't an issue, and to_dataframe() worked also without pyarrow
  • It seems this commit: 801e4c0
    made changes to remove that support.

But if pyarrow is necessary for to_dataframe() to function, shouldn't it be a dependency that installs with pip install google-cloud-bigquery

@HemangChothani
Copy link
Contributor

HemangChothani commented Nov 9, 2020

It's already included in extras as you can see here:

"pyarrow >= 1.0.0, < 2.0dev",

EDIT: You can find the reason here why drop the fastparquet and decided to go with the pyarrow

@robertlacok
Copy link
Author

I see, thanks. I'll change the tooltip for our bigquery integration to suggest installing

pip install google-cloud-bigquery[pandas]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

2 participants