Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: pandas DataFrame samples are more standalone #224

Merged
merged 7 commits into from Jul 13, 2021

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Jun 30, 2021

In response to customer issue 179797311

This updates the code samples on https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas#objectives to include relevant imports.

Also:

  • Add shared fixture for project_id
  • Use create_bqstorage_client instead of manually creating one. Comment that this is the default.

Note: the samples in main_test.py are still there. We'll need to remove those once the docs have been updated.

@tswast tswast requested a review from a team as a code owner June 30, 2021 21:17
@tswast tswast requested review from engelke and removed request for a team June 30, 2021 21:17
@product-auto-label product-auto-label bot added the api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API. label Jun 30, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jun 30, 2021
@snippet-bot
Copy link

snippet-bot bot commented Jun 30, 2021

Here is the summary of changes.

You are about to add 4 region tags.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@product-auto-label product-auto-label bot added the samples Issues that are directly related to samples. label Jun 30, 2021
@tswast tswast requested review from a team and shollyman and removed request for a team July 1, 2021 15:10
# Optionally, explicitly request to use the BigQuery Storage API. As of
# google-cloud-bigquery version 1.26.0 and above, the BigQuery Storage
# API is used by default.
create_bqstorage_client=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it's prone to get people smashing into the guardrail of dependency management even more?

If they're using a version of the BQ client library that doesn't have this on by default, I suspect that getting dependencies updated is a non-trivial matter. And the explicit bqstorage examples can help them probe with less intermediate magic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh. Even now with BQ Storage as an optional "extra", the package manager doesn't give these users much help. At least now newer versions of the BQ library provide info in the error message about what package versions they need to install.

I'm tempted more and more just to make BQ Storage a required dependency. We have enough gRPC-based libraries now that I'm not as worried about pulling in grpcio as a dependency (in fact, I think we already are, anyway).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan, since we're still not great about even documenting the optional dependencies properly. I'd go so far as to even consider arrow as mandatory as well, as anecdotally we see people tripping on dependencies more than feedback about dependency graph being too large etc.

stream = read_session.streams[0]
reader = bqstorageclient.read_rows(stream.name)

# Parse all Arrow blocks and create a dataframe. This call requires a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you get the schema on the first readrows response, passing the session around should be unnecessary. Worth addressing this in the client before finishing out this sample?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. #168

Yeah, I think that simplifying this sample is good motivation for working on that feature.

@tswast tswast merged commit 4026997 into googleapis:master Jul 13, 2021
@tswast tswast deleted the b179797311-pandas-samples branch July 13, 2021 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API. cla: yes This human has signed the Contributor License Agreement. samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants