Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: replace default retry for upload operations #480

Merged
merged 11 commits into from Jun 30, 2021
109 changes: 83 additions & 26 deletions google/cloud/storage/blob.py
Expand Up @@ -82,7 +82,6 @@
from google.cloud.storage.retry import DEFAULT_RETRY
from google.cloud.storage.retry import DEFAULT_RETRY_IF_ETAG_IN_JSON
from google.cloud.storage.retry import DEFAULT_RETRY_IF_GENERATION_SPECIFIED
from google.cloud.storage.retry import DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED
from google.cloud.storage.fileio import BlobReader
from google.cloud.storage.fileio import BlobWriter

Expand Down Expand Up @@ -1703,10 +1702,10 @@ def _do_multipart_upload(
:type num_retries: int
:param num_retries:
Number of upload retries. By default, only uploads with
if_metageneration_match set will be retried, as uploads without the
if_generation_match set will be retried, as uploads without the
argument are not guaranteed to be idempotent. Setting num_retries
will override this default behavior and guarantee retries even when
if_metageneration_match is not set. (Deprecated: This argument
if_generation_match is not set. (Deprecated: This argument
will be removed in a future release.)

:type predefined_acl: str
Expand Down Expand Up @@ -1750,7 +1749,7 @@ def _do_multipart_upload(

This private method does not accept ConditionalRetryPolicy values
because the information necessary to evaluate the policy is instead
evaluated in client.download_blob_to_file().
evaluated in blob._do_upload().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it looks like we're evaluating it BOTH in blob._do_upload() and in client.download_blob_to_file(). This is harmless, which is why the tests didn't catch it, but we only need one of those.

I don't have a strong opinion on whether we do this in _do_upload or download_blob_to_file. Let's keep this docstring change and additionally we can remove the conditional retry policy code in client.download_blob_to_file, either in another PR, or in this one if you prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, I believe we'll need to remain both ConditionalRetryPolicy evaluations as one is for upload operations and the other is for downloads. I revised the docstrings here to match particularly where we're doing the evaluations for upload operations, blob._do_upload() .


See the retry.py source code and docstrings in this package
(google.cloud.storage.retry) for information on retry types and how
Expand Down Expand Up @@ -1877,10 +1876,10 @@ def _initiate_resumable_upload(
:type num_retries: int
:param num_retries:
Number of upload retries. By default, only uploads with
tritone marked this conversation as resolved.
Show resolved Hide resolved
if_metageneration_match set will be retried, as uploads without the
if_generation_match set will be retried, as uploads without the
argument are not guaranteed to be idempotent. Setting num_retries
will override this default behavior and guarantee retries even when
if_metageneration_match is not set. (Deprecated: This argument
if_generation_match is not set. (Deprecated: This argument
will be removed in a future release.)

:type extra_headers: dict
Expand Down Expand Up @@ -1936,7 +1935,7 @@ def _initiate_resumable_upload(

This private method does not accept ConditionalRetryPolicy values
because the information necessary to evaluate the policy is instead
evaluated in client.download_blob_to_file().
evaluated in blob._do_upload().

See the retry.py source code and docstrings in this package
(google.cloud.storage.retry) for information on retry types and how
Expand Down Expand Up @@ -2070,10 +2069,10 @@ def _do_resumable_upload(
:type num_retries: int
:param num_retries:
Number of upload retries. By default, only uploads with
if_metageneration_match set will be retried, as uploads without the
if_generation_match set will be retried, as uploads without the
argument are not guaranteed to be idempotent. Setting num_retries
will override this default behavior and guarantee retries even when
if_metageneration_match is not set. (Deprecated: This argument
if_generation_match is not set. (Deprecated: This argument
will be removed in a future release.)

:type predefined_acl: str
Expand Down Expand Up @@ -2119,7 +2118,7 @@ def _do_resumable_upload(

This private method does not accept ConditionalRetryPolicy values
because the information necessary to evaluate the policy is instead
evaluated in client.download_blob_to_file().
evaluated in blob._do_upload().

See the retry.py source code and docstrings in this package
(google.cloud.storage.retry) for information on retry types and how
Expand Down Expand Up @@ -2204,10 +2203,10 @@ def _do_upload(
:type num_retries: int
:param num_retries:
Number of upload retries. By default, only uploads with
if_metageneration_match set will be retried, as uploads without the
if_generation_match set will be retried, as uploads without the
argument are not guaranteed to be idempotent. Setting num_retries
will override this default behavior and guarantee retries even when
if_metageneration_match is not set. (Deprecated: This argument
if_generation_match is not set. (Deprecated: This argument
will be removed in a future release.)

:type predefined_acl: str
Expand Down Expand Up @@ -2258,7 +2257,7 @@ def _do_upload(
This class exists to provide safe defaults for RPC calls that are
not technically safe to retry normally (due to potential data
duplication or other side-effects) but become safe to retry if a
condition such as if_metageneration_match is set.
condition such as if_generation_match is set.

See the retry.py source code and docstrings in this package
(google.cloud.storage.retry) for information on retry types and how
Expand Down Expand Up @@ -2337,7 +2336,7 @@ def upload_from_file(
if_metageneration_not_match=None,
timeout=_DEFAULT_TIMEOUT,
checksum=None,
retry=DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED,
retry=DEFAULT_RETRY_IF_GENERATION_SPECIFIED,
):
"""Upload the contents of this blob from a file-like object.

Expand Down Expand Up @@ -2397,10 +2396,10 @@ def upload_from_file(
:type num_retries: int
:param num_retries:
Number of upload retries. By default, only uploads with
if_metageneration_match set will be retried, as uploads without the
if_generation_match set will be retried, as uploads without the
argument are not guaranteed to be idempotent. Setting num_retries
will override this default behavior and guarantee retries even when
if_metageneration_match is not set. (Deprecated: This argument
if_generation_match is not set. (Deprecated: This argument
will be removed in a future release.)

:type client: :class:`~google.cloud.storage.client.Client`
Expand Down Expand Up @@ -2456,7 +2455,7 @@ def upload_from_file(
This class exists to provide safe defaults for RPC calls that are
not technically safe to retry normally (due to potential data
duplication or other side-effects) but become safe to retry if a
condition such as if_metageneration_match is set.
condition such as if_generation_match is set.

See the retry.py source code and docstrings in this package
(google.cloud.storage.retry) for information on retry types and how
Expand All @@ -2479,7 +2478,7 @@ def upload_from_file(
# num_retries and retry are mutually exclusive. If num_retries is
# set and retry is exactly the default, then nullify retry for
# backwards compatibility.
if retry is DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED:
if retry is DEFAULT_RETRY_IF_GENERATION_SPECIFIED:
retry = None

_maybe_rewind(file_obj, rewind=rewind)
Expand Down Expand Up @@ -2518,7 +2517,7 @@ def upload_from_filename(
if_metageneration_not_match=None,
timeout=_DEFAULT_TIMEOUT,
checksum=None,
retry=DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED,
retry=DEFAULT_RETRY_IF_GENERATION_SPECIFIED,
):
"""Upload this blob's contents from the content of a named file.

Expand Down Expand Up @@ -2558,10 +2557,10 @@ def upload_from_filename(
:type num_retries: int
:param num_retries:
Number of upload retries. By default, only uploads with
if_metageneration_match set will be retried, as uploads without the
if_generation_match set will be retried, as uploads without the
argument are not guaranteed to be idempotent. Setting num_retries
will override this default behavior and guarantee retries even when
if_metageneration_match is not set. (Deprecated: This argument
if_generation_match is not set. (Deprecated: This argument
will be removed in a future release.)

:type predefined_acl: str
Expand Down Expand Up @@ -2612,7 +2611,7 @@ def upload_from_filename(
This class exists to provide safe defaults for RPC calls that are
not technically safe to retry normally (due to potential data
duplication or other side-effects) but become safe to retry if a
condition such as if_metageneration_match is set.
condition such as if_generation_match is set.

See the retry.py source code and docstrings in this package
(google.cloud.storage.retry) for information on retry types and how
Expand Down Expand Up @@ -2656,7 +2655,7 @@ def upload_from_string(
if_metageneration_not_match=None,
timeout=_DEFAULT_TIMEOUT,
checksum=None,
retry=DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED,
retry=DEFAULT_RETRY_IF_GENERATION_SPECIFIED,
):
"""Upload contents of this blob from the provided string.

Expand Down Expand Up @@ -2687,10 +2686,10 @@ def upload_from_string(
:type num_retries: int
:param num_retries:
Number of upload retries. By default, only uploads with
if_metageneration_match set will be retried, as uploads without the
if_generation_match set will be retried, as uploads without the
argument are not guaranteed to be idempotent. Setting num_retries
will override this default behavior and guarantee retries even when
if_metageneration_match is not set. (Deprecated: This argument
if_generation_match is not set. (Deprecated: This argument
will be removed in a future release.)

:type client: :class:`~google.cloud.storage.client.Client`
Expand Down Expand Up @@ -2746,7 +2745,7 @@ def upload_from_string(
This class exists to provide safe defaults for RPC calls that are
not technically safe to retry normally (due to potential data
duplication or other side-effects) but become safe to retry if a
condition such as if_metageneration_match is set.
condition such as if_generation_match is set.

See the retry.py source code and docstrings in this package
(google.cloud.storage.retry) for information on retry types and how
Expand Down Expand Up @@ -2783,6 +2782,11 @@ def create_resumable_upload_session(
client=None,
timeout=_DEFAULT_TIMEOUT,
checksum=None,
if_generation_match=None,
tritone marked this conversation as resolved.
Show resolved Hide resolved
if_generation_not_match=None,
if_metageneration_match=None,
if_metageneration_not_match=None,
retry=DEFAULT_RETRY_IF_GENERATION_SPECIFIED,
):
"""Create a resumable upload session.

Expand Down Expand Up @@ -2858,6 +2862,41 @@ def create_resumable_upload_session(
delete the uploaded object automatically. Supported values
are "md5", "crc32c" and None. The default is None.

:type if_generation_match: long
:param if_generation_match:
(Optional) See :ref:`using-if-generation-match`

:type if_generation_not_match: long
:param if_generation_not_match:
(Optional) See :ref:`using-if-generation-not-match`

:type if_metageneration_match: long
:param if_metageneration_match:
(Optional) See :ref:`using-if-metageneration-match`

:type if_metageneration_not_match: long
:param if_metageneration_not_match:
(Optional) See :ref:`using-if-metageneration-not-match`

:type retry: google.api_core.retry.Retry or google.cloud.storage.retry.ConditionalRetryPolicy
:param retry: (Optional) How to retry the RPC. A None value will disable
retries. A google.api_core.retry.Retry value will enable retries,
and the object will define retriable response codes and errors and
configure backoff and timeout options.
A google.cloud.storage.retry.ConditionalRetryPolicy value wraps a
Retry object and activates it only if certain conditions are met.
This class exists to provide safe defaults for RPC calls that are
not technically safe to retry normally (due to potential data
duplication or other side-effects) but become safe to retry if a
condition such as if_metageneration_match is set.
See the retry.py source code and docstrings in this package
(google.cloud.storage.retry) for information on retry types and how
to configure them.
Media operations (downloads and uploads) do not support non-default
predicates in a Retry object. The default will always be used. Other
configuration changes for Retry objects such as delays and deadlines
are respected.

:rtype: str
:returns: The resumable upload session URL. The upload can be
completed by making an HTTP PUT request with the
Expand All @@ -2866,6 +2905,19 @@ def create_resumable_upload_session(
:raises: :class:`google.cloud.exceptions.GoogleCloudError`
if the session creation response returns an error status.
"""

# Handle ConditionalRetryPolicy.
if isinstance(retry, ConditionalRetryPolicy):
# Conditional retries are designed for non-media calls, which change
# arguments into query_params dictionaries. Media operations work
# differently, so here we make a "fake" query_params to feed to the
# ConditionalRetryPolicy.
query_params = {
"ifGenerationMatch": if_generation_match,
"ifMetagenerationMatch": if_metageneration_match,
}
retry = retry.get_retry_policy_if_conditions_met(query_params=query_params)

extra_headers = {}
if origin is not None:
# This header is specifically for client-side uploads, it
Expand All @@ -2884,10 +2936,15 @@ def create_resumable_upload_session(
size,
None,
predefined_acl=None,
if_generation_match=if_generation_match,
if_generation_not_match=if_generation_not_match,
if_metageneration_match=if_metageneration_match,
if_metageneration_not_match=if_metageneration_not_match,
extra_headers=extra_headers,
chunk_size=self._CHUNK_SIZE_MULTIPLE,
timeout=timeout,
checksum=checksum,
retry=retry,
)

return upload.resumable_url
Expand Down
6 changes: 3 additions & 3 deletions google/cloud/storage/fileio.py
Expand Up @@ -18,7 +18,7 @@
from google.api_core.exceptions import RequestRangeNotSatisfiable
from google.cloud.storage._helpers import _NUM_RETRIES_MESSAGE
from google.cloud.storage.retry import DEFAULT_RETRY
from google.cloud.storage.retry import DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED
from google.cloud.storage.retry import DEFAULT_RETRY_IF_GENERATION_SPECIFIED
from google.cloud.storage.retry import ConditionalRetryPolicy


Expand Down Expand Up @@ -278,7 +278,7 @@ def __init__(
blob,
chunk_size=None,
text_mode=False,
retry=DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED,
retry=DEFAULT_RETRY_IF_GENERATION_SPECIFIED,
**upload_kwargs
):
for kwarg in upload_kwargs:
Expand Down Expand Up @@ -346,7 +346,7 @@ def _initiate_upload(self):
# num_retries and retry are mutually exclusive. If num_retries is
# set and retry is exactly the default, then nullify retry for
# backwards compatibility.
if retry is DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED:
if retry is DEFAULT_RETRY_IF_GENERATION_SPECIFIED:
retry = None

# Handle ConditionalRetryPolicy.
Expand Down