Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Adding ConnectionError to retry mechanism #822

Merged
merged 2 commits into from Mar 23, 2020

Conversation

damgad
Copy link
Contributor

@damgad damgad commented Feb 7, 2020

As described in #558 current retry mechanism does not catch ConnectionError (and ConnectionResetError which inherits from it). That pull request fixes the issue by adding the python3 specific ConnectionError into the mechanism.

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Feb 7, 2020
@damgad damgad changed the title Adding ConnectionError to retry mechanism fix: Adding ConnectionError to retry mechanism Feb 7, 2020
@busunkim96 busunkim96 self-assigned this Feb 10, 2020
@dylancaponi
Copy link

We've noticed ConnectionResetError not being retried for BigQuery, Storage, and Redis / Memorystore clients. If this PR fixes any of those we'll be very grateful! Happens about once per day on all of our deployed cloud functions.

@dylancaponi
Copy link

Also noticed gzp, base64, sys, and http_client from six.moves are imported but unused.

@dylancaponi
Copy link

@damgad Any idea how these modifications could be made "in place" until this PR is merged?

@damgad
Copy link
Contributor Author

damgad commented Mar 11, 2020

@dylancaponi Sure, you can always monkeypatch the _retry_request function which handles retrying. There are multiple ways of doing that (one dirtier than another ;) ) Before importing the googleapiclient write that:

import googleapiclient.http
from googleapiclient.http import LOGGER, _ssl_SSLError, _should_retry_response

def _patched_retry_request(
    http, num_retries, req_type, sleep, rand, uri, method, *args, **kwargs
):

    resp = None
    content = None
    exception = None
    for retry_num in range(num_retries + 1):
        if retry_num > 0:
            # Sleep before retrying.
            sleep_time = rand() * 2 ** retry_num
            LOGGER.warning(
                "Sleeping %.2f seconds before retry %d of %d for %s: %s %s, after %s",
                sleep_time,
                retry_num,
                num_retries,
                req_type,
                method,
                uri,
                resp.status if resp else exception,
            )
            sleep(sleep_time)

        try:
            exception = None
            resp, content = http.request(uri, method, *args, **kwargs)
        # Retry on SSL errors and socket timeout errors.
        except _ssl_SSLError as ssl_error:
            exception = ssl_error
        except socket.timeout as socket_timeout:
            # It's important that this be before socket.error as it's a subclass
            # socket.timeout has no errorcode
            exception = socket_timeout
        except ConnectionError as connection_error:
            # Needs to be before socket.error as it's a subclass of
            # OSError (socket.error)
            exception = connection_error
        except socket.error as socket_error:
            # errno's contents differ by platform, so we have to match by name.
            if socket.errno.errorcode.get(socket_error.errno) not in {
                "WSAETIMEDOUT",
                "ETIMEDOUT",
                "EPIPE",
                "ECONNABORTED",
            }:
                raise
            exception = socket_error
        except httplib2.ServerNotFoundError as server_not_found_error:
            exception = server_not_found_error

        if exception:
            if retry_num == num_retries:
                raise exception
            else:
                continue

        if not _should_retry_response(resp.status, content):
            break

    return resp, content


googleapiclient.http._retry_request = _patched_retry_request

That will just overwrite the the function from the library. You can also do it in shorter way:

import googleapiclient.http
import inspect

exec(
    inspect.getsource(
        googleapiclient.http._retry_request
    ).replace("socket.timeout", "(socket.timeout, ConnectionError)"),
    googleapiclient.http.__dict__
)

I would also suggest to make sure that you've pinned the version of google-api-python-client in your requirements, as consequences of the automatic update may be unpredictable.

@mik-laj
Copy link
Contributor

mik-laj commented Mar 17, 2020

@busunkim96 Any progress? This causes problems for Apache Airflow. Apache Airflow is used by the Cloud Composer service.

Copy link
Contributor

@busunkim96 busunkim96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @damgad!

@busunkim96 busunkim96 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 20, 2020
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 20, 2020
@busunkim96 busunkim96 merged commit c7516a2 into googleapis:master Mar 23, 2020
@mik-laj
Copy link
Contributor

mik-laj commented Mar 24, 2020

@busunkim96 Hello. Thanks for accepting this change. Do you know when it will appear in the new release?

@dylancaponi
Copy link

@damgad We've implemented the shorter monkeypatch for a few weeks. It seems to retry the call in question but not reestablish the connection. So we just get connectionRetryError 3x now instead of once per call. Wondering if I implemented something wrong or misunderstood the purpose of this PR.

@damgad
Copy link
Contributor Author

damgad commented Apr 13, 2020

@dylancaponi Seems that you've correctly applied the patch and it works how it should work. However, the aim of the whole retry mechanism is to overcome single occurring errors - temporary issues that may disappear with the next try.

In your case, you should either investigate why ConnectionRetryError really happens and try to fix the root cause or you can also increase the num_retries parameter to some lucky, larger value. That may help if the error cause is temporary as the retry mechanism uses exponential backoff with each consecutive try.
But, if you increase the num_retries, you may also get ConnectionRetryError even more times.

@mik-laj
Copy link
Contributor

mik-laj commented Apr 20, 2020

A new release has been released that includes this fix.
#868

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants