Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: distinguish server timeouts from transport timeouts #43

Merged
merged 6 commits into from Mar 9, 2020

Conversation

plamut
Copy link
Contributor

@plamut plamut commented Feb 25, 2020

Fixes #40.

This PR fixes the problem with timeouts sometimes occurring too early by making a distinction between the server timeout (the timeoutMs API parameter) and the timeout for the transport layer. These two are now independent from each other.

How to reproduce

Seems like the ticket description provides a reasonable way to do it (repeating a non-trivial query a lot of times). I had quite some trouble reproducing it consistently on my network, but manually disabling one's internet connection can also be used.

See the rest of the notes for remarks/discussion.

Methods might block longer than timeout

In the 1.24.0 release, a timeout parameter was added to public methods to prevent HTTP requests from hanging indefinitely. However, if the timeout is not provided (e.g. when polling for job completion), trying to estimate it from the server-side "is job done" timeout can lead to random timeout errors due to random network delays, etc.

Since timeout is now directly passed as the timeout to the underlying requests lib, it means that the actual duration of a function call can be considerably longer than a wall clock timeout would be.

As this negates the wall clock timeout approximation, the logic dealing with that has also been removed in the methods that might send multiple requests. The transport timeout now applies to each individual request.

Timeout errors are not retried

If a transport timeout error is raised, it is currently not retried by the default retry object. We might actually not want to change that, as timeout errors are generally used by the core futures to signal that retrying a request took too long.

In any case, the default behavior is now again the same as in pre-1.24.0 versions, meaning that the user code that worked with, say, version 1.19.0 should behave the same.

PR checklist

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

A transport layer timeout is made independent of the query timeout,
i.e. the maximum time to wait for the query to complete.

The query timeout is used by the blocking poll so that the backend
does not block for too long when polling for job completion, but
the transport can have different timeout requirements, and we do
not want it to be raising sometimes unnecessary timeout errors.
As job methods do not split the timeout anymore between all requests a
method might make, the Client methods are adjusted in the same way.
@plamut plamut added the api: bigquery Issues related to the googleapis/python-bigquery API. label Feb 25, 2020
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Feb 25, 2020
@plamut plamut requested a review from tswast March 2, 2020 08:17
@plamut
Copy link
Contributor Author

plamut commented Mar 9, 2020

@tswast Ping.

(BTW, should I still be requesting BigQuery reviews from you by default? Or should I pick, say, @shollyman instead?)

Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one question.

Yes, please direct future PRs to @shollyman in the future. He can pull me in when he needs more context.

tests/unit/test_client.py Show resolved Hide resolved
@plamut plamut merged commit a17be5f into googleapis:master Mar 9, 2020
@plamut plamut deleted the iss-40 branch March 9, 2020 18:49
@sonots
Copy link

sonots commented May 14, 2020

@plamut Hi, I like a new version will be released.

@plamut
Copy link
Contributor Author

plamut commented May 14, 2020

@sonots I will have to check, but since quite a few fixes and additions have been made since the last release, I think there is a decent chance a new version will be released "soon" (say, by the end of the month).

It's just my personal opinion, however, not an official answer, but I will try to make it happen sooner rather than later.

@plamut
Copy link
Contributor Author

plamut commented Jun 10, 2020

@sonots FYI, a new version of the library has been released, you can now try it out.

@linnabrown
Copy link

Thanks. Could you provide an example when I want to insert a CSV file into bigquery allowing longer timeout limitation? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Random timeouts in api calls to bigquery.googleapis.com
5 participants