Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid possible job already exists error #751

Merged
merged 2 commits into from Jul 14, 2021

Conversation

plamut
Copy link
Contributor

@plamut plamut commented Jul 12, 2021

Fixes #738.

If job create request fails, a query job might still have started successfully. This PR handles this edge case and returns such
query job one can be found.

Based on the similar fix in the Java client.

PR checklist:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

If job create request fails, a query job might still have started
successfully. This commit handles this edge case and returns such
query job one can be found.
@plamut plamut requested review from tswast and a team July 12, 2021 09:32
@plamut plamut requested a review from a team as a code owner July 12, 2021 09:32
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jul 12, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jul 12, 2021
google/cloud/bigquery/client.py Show resolved Hide resolved
google/cloud/bigquery/client.py Outdated Show resolved Hide resolved
@plamut
Copy link
Contributor Author

plamut commented Jul 12, 2021

@tseaver I don't know the exact mechanics on the backend, this fix is mostly based on a similar fix in the Java client.

@tswast Can you chime in?

@plamut
Copy link
Contributor Author

plamut commented Jul 14, 2021

The docs check failure does not seem to be related:

gpg: keyserver receive failed: No name

Update: Indeed, but the fix is on its way.

@plamut plamut added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 14, 2021
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 14, 2021
@tseaver
Copy link
Contributor

tseaver commented Jul 14, 2021

googleapis/synthtool#1155 landed here in #762. I'm not sure why the config isn't making you merge with master to pick that fix up, however.

@tswast tswast merged commit 45b9308 into googleapis:master Jul 14, 2021
@plamut plamut deleted the iss-738 branch July 14, 2021 19:21
gcf-merge-on-green bot pushed a commit that referenced this pull request Jul 19, 2021
🤖 I have created a release \*beep\* \*boop\*
---
## [2.22.0](https://www.github.com/googleapis/python-bigquery/compare/v2.21.0...v2.22.0) (2021-07-19)


### Features

* add `LoadJobConfig.projection_fields` to select DATASTORE_BACKUP fields ([#736](https://www.github.com/googleapis/python-bigquery/issues/736)) ([c45a738](https://www.github.com/googleapis/python-bigquery/commit/c45a7380871af3dfbd3c45524cb606c60e1a01d1))
* add standard sql table type, update scalar type enums ([#777](https://www.github.com/googleapis/python-bigquery/issues/777)) ([b8b5433](https://www.github.com/googleapis/python-bigquery/commit/b8b5433898ec881f8da1303614780a660d94733a))
* add support for more detailed DML stats ([#758](https://www.github.com/googleapis/python-bigquery/issues/758)) ([36fe86f](https://www.github.com/googleapis/python-bigquery/commit/36fe86f41c1a8f46167284f752a6d6bbf886a04b))
* add support for user defined Table View Functions ([#724](https://www.github.com/googleapis/python-bigquery/issues/724)) ([8c7b839](https://www.github.com/googleapis/python-bigquery/commit/8c7b839a6ac1491c1c3b6b0e8755f4b70ed72ee3))


### Bug Fixes

* avoid possible job already exists error ([#751](https://www.github.com/googleapis/python-bigquery/issues/751)) ([45b9308](https://www.github.com/googleapis/python-bigquery/commit/45b93089f5398740413104285cc8acfd5ebc9c08))


### Dependencies

* allow 2.x versions of `google-api-core`, `google-cloud-core`, `google-resumable-media` ([#770](https://www.github.com/googleapis/python-bigquery/issues/770)) ([87a09fa](https://www.github.com/googleapis/python-bigquery/commit/87a09fa3f2a9ab35728a1ac925f9d5f2e6616c65))


### Documentation

* add loading data from Firestore backup sample ([#737](https://www.github.com/googleapis/python-bigquery/issues/737)) ([22fd848](https://www.github.com/googleapis/python-bigquery/commit/22fd848cae4af1148040e1faa31dd15a4d674687))
---


This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
raise create_exc

try:
query_job = self.get_job(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, there is a slight problem with this change - self.get_job has a different return type to this function. It can return LoadJob, etc as well as the QueryJob we're expecting so the actual return type doesn't match what is declared for this function.

I don't understand the situations that could result in this code being called, but presumably in reality this would always be a QueryJob? Unfortunately this is causing me problems when running pylint over some code that calls this, because it thinks the function can return LoadJob, and that has a different set of members to QueryJob.

Many thanks,
Andrew Wilkinson

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, in this context self.get_job() returns a QueryJob, because job_id is the same ID that was used a few lines above when constructing a new query job (and then starting it).

This project uses pytype for static type checks and it did not complain, but apparently pylint could not deduce the same and reported a false issue.

Could you tell pylint to ignore return type in that specific line where query() is called? IMHO that justifiable, because pylint is wrong there.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having looked into this a bit further I agree that pylint is wrong. It's a bit of a pain to have disable this check every time we call query, but I think this is a sign that pylint is aging and not keeping up with modern Python's type syntax.

Sorry for the noise.

Cheers,
Andrew

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, it was a perfectly valid comment.

Ideally, pylint would allow ignoring particular warnings for lines matching a regex, but I'm not sure if that's currently supported? It would make disabling those false positives much cleaner compared to spamming the # pylint: disable=... comments all around the code.

Copy link

@andrewjw andrewjw Jul 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly the error isn't raised on the call to query, but when you try and use the return value. In my case this is accessing num_dml_affected_rows, which only exists on QueryJob, and not on LoadJob. Even if it did support disabling errors using a regex, I'm not sure it would be practical to create one.

It's been bugging me why this wouldn't be picked up by the type checker. I think I've tracked it down to the fact that LoadJob, QueryJob, etc all derive from _AsyncJob, which in turn derives from google.api_core.future.polling.PollingFuture. The problem is that google.api_core.future.polling.PollingFuture is not typable, so it gets turns into an Any type, which makes all the job types equivalent and therefore doesn't generate an error. When testing with mypy you have to add # type: ignore to the PollingFuture import line explicitly, but I guess pytype is more forgiving.

I've create the attached file demonstrating the problem (annoyingly github won't let me attach the file as a .py). As currently written it'll generate an error in both mypy and pytype, but swap the comments on lines 5 and 6 and the error goes away.

Anyway, I have a reasonable workaround, so if you want to leave this that's absolutely fine. If in future the python-api-core library adds typing then I expect this to break though. Adding an assert isinstance(query_job, job.QueryJob) will resolve the issue.

Cheers,
Andrew
invalid_return_union.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Job already exists
5 participants