Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Pandas and GeoPandas samples #235

Closed
wants to merge 13 commits into from

Conversation

jimfulton
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #231 🦕

@product-auto-label product-auto-label bot added api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. samples Issues that are directly related to samples. labels Jul 30, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jul 30, 2021
@jimfulton jimfulton added kokoro:force-run Add this label to force Kokoro to re-run the tests. kokoro:run Add this label to force Kokoro to re-run the tests. labels Jul 30, 2021
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Jul 30, 2021
@jimfulton jimfulton added kokoro:force-run Add this label to force Kokoro to re-run the tests. kokoro:run Add this label to force Kokoro to re-run the tests. labels Aug 2, 2021
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Aug 2, 2021
@jimfulton jimfulton marked this pull request as ready for review August 2, 2021 15:27
@jimfulton jimfulton requested review from a team as code owners August 2, 2021 15:27
@jimfulton jimfulton requested a review from engelke August 2, 2021 15:27
@snippet-bot
Copy link

snippet-bot bot commented Aug 3, 2021

Here is the summary of changes.

You are about to add 2 region tags.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@jimfulton jimfulton added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Aug 3, 2021
@jimfulton
Copy link
Contributor Author

I added a "do not merge" tag, because I'm not sure we want people to do this.

However, I'd kind of like to know how to do samples, and would like some feedback. :)

@tswast tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Aug 5, 2021
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Aug 5, 2021
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look good! I think we just need to adjust the tags so that the snippet bot is happy.



def read_geographic_data_into_pandas_using_read_sql() -> None:
# [START sqlalchemy_bigquery_read_postgis]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the way we track samples, the tag needs to start with the API name.

How about bigquery_geopandas_sqlalchemy ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

"EPSG:4326",
)

# Don't wrap pr elide:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this comment mean? I'm not familiar with pr elide in this context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm betting it is really or elide:.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, fixed



def read_data_into_pandas_using_read_sql() -> None:
# [START sqlalchemy_bigquery_read_sql]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to start with bigquery, how about bigquery_pandas_sqlalchemy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@tswast
Copy link
Collaborator

tswast commented Aug 6, 2021

Kokoro failure looks like a flake by the backend. I learned that the backend can take as long as 4 minutes to fail on some API requests, so hopefully googleapis/python-bigquery#859 helps with this.

google.api_core.exceptions.RetryError: Deadline of 120.0s exceeded while calling functools.partial(functools.partial(<bound method JSONConnection.api_request of <google.cloud.bigquery._http.Connection object at 0x7fe2a9e0e970>>, method='GET', path='/projects/precise-truck-742/queries/0aba5d5e-abdb-415a-9ffd-ad2aa450ad59', query_params={'maxResults': 0, 'location': 'US'}, timeout=None)), last exception: 500 GET https://bigquery.googleapis.com/bigquery/v2/projects/precise-truck-742/queries/0aba5d5e-abdb-415a-9ffd-ad2aa450ad59?maxResults=0&location=US&prettyPrint=false: An internal error occurred and the request could not be completed.

In the meantime, I'll manually trigger the Kokoro job to rebuild.

@jimfulton
Copy link
Contributor Author

Kokoro failure looks like a flake by the backend.

Yup, I figured.

...

In the meantime, I'll manually trigger the Kokoro job to rebuild.

Cool, but remember the goal of this PR, at least for me, was to get a grip on samples. :)

It's been super helpful, but I don't intend to merge it.

IMO, these samples are counter productive, because, as you've noted, the BQ python libary's to_dataset (and soon, to_geodataset) is more performant and scalable. Of course, if you disagree, I'm happy to defer.

My thought is when to_geodataset lands (googleapis/python-bigquery#848 is ready for review, BTW :) ), I'd make a GeoPandas PR to add from_bigquery and to_bigquery using to_geodataframe and load_from_dataframe.

@jimfulton
Copy link
Contributor Author

I've merged this into the geography branch and deleted these two examples (and added a geography sample).

If you really want these samples, let me know and I'll reopen this.

@jimfulton jimfulton closed this Aug 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. cla: yes This human has signed the Contributor License Agreement. do not merge Indicates a pull request not ready for merge, due to either quality or timing. samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create pandas samples
4 participants