New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: pandas DataFrame samples are more standalone #224
Conversation
Here is the summary of changes. You are about to add 4 region tags.
This comment is generated by snippet-bot.
|
# Optionally, explicitly request to use the BigQuery Storage API. As of | ||
# google-cloud-bigquery version 1.26.0 and above, the BigQuery Storage | ||
# API is used by default. | ||
create_bqstorage_client=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it's prone to get people smashing into the guardrail of dependency management even more?
If they're using a version of the BQ client library that doesn't have this on by default, I suspect that getting dependencies updated is a non-trivial matter. And the explicit bqstorage examples can help them probe with less intermediate magic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh. Even now with BQ Storage as an optional "extra", the package manager doesn't give these users much help. At least now newer versions of the BQ library provide info in the error message about what package versions they need to install.
I'm tempted more and more just to make BQ Storage a required dependency. We have enough gRPC-based libraries now that I'm not as worried about pulling in grpcio as a dependency (in fact, I think we already are, anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a fan, since we're still not great about even documenting the optional dependencies properly. I'd go so far as to even consider arrow as mandatory as well, as anecdotally we see people tripping on dependencies more than feedback about dependency graph being too large etc.
stream = read_session.streams[0] | ||
reader = bqstorageclient.read_rows(stream.name) | ||
|
||
# Parse all Arrow blocks and create a dataframe. This call requires a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you get the schema on the first readrows response, passing the session around should be unnecessary. Worth addressing this in the client before finishing out this sample?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. #168
Yeah, I think that simplifying this sample is good motivation for working on that feature.
🤖 I have created a release \*beep\* \*boop\* --- ### [2.6.1](https://www.github.com/googleapis/python-bigquery-storage/compare/v2.6.0...v2.6.1) (2021-07-20) ### Bug Fixes * **deps:** pin 'google-{api,cloud}-core', 'google-auth' to allow 2.x versions ([#240](https://www.github.com/googleapis/python-bigquery-storage/issues/240)) ([8f848e1](https://www.github.com/googleapis/python-bigquery-storage/commit/8f848e18379085160492cdd2d12dc8de50a46c8e)) ### Documentation * pandas DataFrame samples are more standalone ([#224](https://www.github.com/googleapis/python-bigquery-storage/issues/224)) ([4026997](https://www.github.com/googleapis/python-bigquery-storage/commit/4026997d7a286b63ed2b969c0bd49de59635326d)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
In response to customer issue 179797311
This updates the code samples on https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas#objectives to include relevant imports.
Also:
project_id
create_bqstorage_client
instead of manually creating one. Comment that this is the default.Note: the samples in
main_test.py
are still there. We'll need to remove those once the docs have been updated.