Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage: Timeout when upload file using google.cloud.storage.Blob.upload_from_filename() #74

Closed
vfa-minhtv opened this issue Feb 26, 2020 · 15 comments
Assignees
Labels
api: storage Issues related to the googleapis/python-storage API. needs more info This issue needs more information from the customer to proceed. priority: p2 Moderately-important priority. Fix may not be included in next release.

Comments

@vfa-minhtv
Copy link

Environment details

OS: MacOS 10.15.1
Python: Python 3.7.4
Google-cloud version:

google-api-core==1.16.0
google-api-python-client==1.7.11
google-auth==1.11.2
google-auth-httplib2==0.0.3
google-auth-oauthlib==0.4.0
google-cloud-core==1.3.0
google-cloud-error-reporting==0.32.1
google-cloud-firestore==1.5.0
google-cloud-kms==1.0.0
google-cloud-logging==1.14.0
google-cloud-storage==1.26.0
google-cloud-translate==1.7.0
google-resumable-media==0.5.0
google-translate==0.1
googleapis-common-protos==1.6.0

Steps to reproduce

  1. Prepare a file with size >300MB
  2. Run blob.upload_from_filename("path/on/storage", "path/of/big/file/on/local")

Stack trace

Traceback (most recent call last):
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/http/client.py", line 1244, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/http/client.py", line 1290, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/http/client.py", line 1239, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/http/client.py", line 1065, in _send_output
    self.send(chunk)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/http/client.py", line 987, in send
    self.sock.sendall(data)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/ssl.py", line 1034, in sendall
    v = self.send(byte_view[count:])
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/ssl.py", line 1003, in send
    return self._sslobj.write(data)
socket.timeout: The write operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/urllib3/util/retry.py", line 400, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/urllib3/packages/six.py", line 734, in reraise
    raise value.with_traceback(tb)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/http/client.py", line 1244, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/http/client.py", line 1290, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/http/client.py", line 1239, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/http/client.py", line 1065, in _send_output
    self.send(chunk)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/http/client.py", line 987, in send
    self.sock.sendall(data)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/ssl.py", line 1034, in sendall
    v = self.send(byte_view[count:])
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/ssl.py", line 1003, in send
    return self._sslobj.write(data)
urllib3.exceptions.ProtocolError: ('Connection aborted.', timeout('The write operation timed out'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-cdc889c11775>", line 4, in <module>
    "data/tmp/averaging.joblib", None)
  File "/Users/dualeoo/PycharmProjects/mlweb-ml/mlweb_ml/firestore/google_storage.py", line 30, in upload
    blob.upload_from_filename(file_path_on_local, content_type, predefined_acl=predefined_acl)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1342, in upload_from_filename
    predefined_acl=predefined_acl,
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1287, in upload_from_file
    client, file_obj, content_type, size, num_retries, predefined_acl
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1197, in _do_upload
    client, stream, content_type, size, num_retries, predefined_acl
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1144, in _do_resumable_upload
    response = upload.transmit_next_chunk(transport)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/google/resumable_media/requests/upload.py", line 425, in transmit_next_chunk
    retry_strategy=self._retry_strategy,
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/google/resumable_media/requests/_helpers.py", line 136, in http_request
    return _helpers.wait_and_retry(func, RequestsMixin._get_status_code, retry_strategy)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/google/resumable_media/_helpers.py", line 150, in wait_and_retry
    response = func()
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/google/auth/transport/requests.py", line 317, in request
    **kwargs
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/Users/dualeoo/miniconda3/envs/mlweb-ml/lib/python3.7/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', timeout('The write operation timed out'))

Expected result

No timeout error

Actual result

The upload timeout after 1 minute

@busunkim96 busunkim96 transferred this issue from googleapis/google-cloud-python Feb 27, 2020
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/python-storage API. label Feb 27, 2020
@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels Feb 27, 2020
@crwilcox
Copy link
Contributor

crwilcox commented Mar 5, 2020

You state you ran blob.upload_from_filename("path/on/storage", "path/of/big/file/on/local"), which doesn't match the signature:
upload_from_filename(filename, content_type=None, client=None, predefined_acl=None)

I created a repro, but unfortunately this is working well for me.

import datetime

# Using version 1.26.0 of google-cloud-storage
from google.cloud import storage

BUCKET_NAME = "your-bucket"

client = storage.Client()
bucket = client.bucket(BUCKET_NAME)

#filename = "4Gb.txt"

# 300Mb.txt generated with 
# `❯ dd if=/dev/urandom of=300Mb.txt bs=1048576 count=300`
filename = "300Mb.txt"

# Write file if necessary.
blob = bucket.blob(filename)
if not blob.exists():
    print(f"Writing {filename}")
    start = datetime.datetime.now()
    blob.upload_from_filename(filename)
    end = datetime.datetime.now()
    print(f"Wrote {filename} {end-start}")

# Read file
print(f"Reading {filename}")
start = datetime.datetime.now()
blob.download_to_filename(f"downloaded-{filename}")
end = datetime.datetime.now()
print(f"Read {filename} {end-start}")

For good measure, I tried the same thing with a 4Gb file as well:

❯ python write_and_read.py
Writing 300Mb.txt
Wrote 300Mb.txt 0:00:46.983565
Reading 300Mb.txt
Read 300Mb.txt 0:00:39.064956
❯ python write_and_read.py
Writing 4Gb.txt
Wrote 4Gb.txt 0:09:43.198637
Reading 4Gb.txt
Read 4Gb.txt 0:14:05.202351

This was done from a new virtual environment and python 3.8

❯ pip freeze
cachetools==4.0.0
certifi==2019.11.28
chardet==3.0.4
google-api-core==1.16.0
google-auth==1.11.2
google-cloud-core==1.3.0
google-cloud-storage==1.26.0
google-resumable-media==0.5.0
googleapis-common-protos==1.51.0
idna==2.9
protobuf==3.11.3
pyasn1==0.4.8
pyasn1-modules==0.2.8
pytz==2019.3
requests==2.23.0
rsa==4.0
six==1.14.0
urllib3==1.25.8

@crwilcox crwilcox added needs more info This issue needs more information from the customer to proceed. priority: p2 Moderately-important priority. Fix may not be included in next release. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Mar 5, 2020
@ElectricSwan
Copy link

I'm also getting the same error on both Ubuntu 18.0.4 with Python 3.6.9 and Windows 10 with Python 3.8.0, both using google-cloud-storage 1.26.0
The timeout happens after 60 seconds.
I'm limited to around 800 kbps upload speed, so for me that gives a timeout for any files larger than 6 MB.
Any uploads that complete within 60 seconds are successful.

@crwilcox
Copy link
Contributor

@ElectricSwan , what is the code you are running? The sample code I provided runs well beyond 60 seconds. Simplified it is

from google.cloud import storage

BUCKET_NAME = "your-bucket"

client = storage.Client()
bucket = client.bucket(BUCKET_NAME)

# `❯ dd if=/dev/urandom of=4Gb.txt bs=1048576 count=4096`
filename = "4Gb.txt"

# Write file if necessary.
blob = bucket.blob(filename)
blob.upload_from_filename(filename)

@ElectricSwan
Copy link

@crwilcox , I just ran your simplified code in a virtual env with Python 3.6.9 on Ubuntu 18.04, and I get the timeout at 60 seconds.

I get the exact same timeout error with your simplified code on my Windows 10 PC running Python 3.8.0.

Here is the pip freeze from the Ubuntu PC, which appears identical to yours;

cachetools==4.0.0
certifi==2019.11.28
chardet==3.0.4
google-api-core==1.16.0
google-auth==1.11.2
google-cloud-core==1.3.0
google-cloud-storage==1.26.0
google-resumable-media==0.5.0
googleapis-common-protos==1.51.0
idna==2.9
pkg-resources==0.0.0
protobuf==3.11.3
pyasn1==0.4.8
pyasn1-modules==0.2.8
pytz==2019.3
requests==2.23.0
rsa==4.0
six==1.14.0
urllib3==1.25.8

Here's my pip freeze from the Windows PC;

astroid==2.3.3
cachetools==4.0.0
certifi==2019.11.28
chardet==3.0.4
colorama==0.4.3
google-api-core==1.16.0
google-auth==1.11.2
google-cloud-core==1.3.0
google-cloud-storage==1.26.0
google-resumable-media==0.5.0
googleapis-common-protos==1.51.0
idna==2.8
isort==4.3.21
lazy-object-proxy==1.4.3
mccabe==0.6.1
protobuf==3.11.3
pyasn1==0.4.8
pyasn1-modules==0.2.8
pylint==2.4.4
pytz==2019.3
requests==2.22.0
rsa==4.0
six==1.14.0
urllib3==1.25.8
wrapt==1.11.2

The only potentially relevant differences that I can see between my Windows and Ubuntu boxes are that my Windows box has slightly older versions of;

idna==2.8
requests==2.22.0

but the result is the same on both platforms; timeout after 60 seconds.

Here is my stacktrace on the Ubuntu PC, which is almost identical to the stacktrace submitted by @vfa-minhtv

Traceback (most recent call last):
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1254, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1300, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1249, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1075, in _send_output
    self.send(chunk)
  File "/usr/lib/python3.6/http/client.py", line 996, in send
    self.sock.sendall(data)
  File "/usr/lib/python3.6/ssl.py", line 975, in sendall
    v = self.send(byte_view[count:])
  File "/usr/lib/python3.6/ssl.py", line 944, in send
    return self._sslobj.write(data)
  File "/usr/lib/python3.6/ssl.py", line 642, in write
    return self._sslobj.write(data)
socket.timeout: The write operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/urllib3/util/retry.py", line 400, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/urllib3/packages/six.py", line 734, in reraise
    raise value.with_traceback(tb)
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1254, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1300, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1249, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1075, in _send_output
    self.send(chunk)
  File "/usr/lib/python3.6/http/client.py", line 996, in send
    self.sock.sendall(data)
  File "/usr/lib/python3.6/ssl.py", line 975, in sendall
    v = self.send(byte_view[count:])
  File "/usr/lib/python3.6/ssl.py", line 944, in send
    return self._sslobj.write(data)
  File "/usr/lib/python3.6/ssl.py", line 642, in write
    return self._sslobj.write(data)
urllib3.exceptions.ProtocolError: ('Connection aborted.', timeout('The write operation timed out',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "simple_test.py", line 18, in <module>
    blob.upload_from_filename(filename)
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 1342, in upload_from_filename
    predefined_acl=predefined_acl,
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 1287, in upload_from_file
    client, file_obj, content_type, size, num_retries, predefined_acl
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 1193, in _do_upload
    client, stream, content_type, size, num_retries, predefined_acl
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 987, in _do_multipart_upload
    response = upload.transmit(transport, data, object_metadata, content_type)
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/google/resumable_media/requests/upload.py", line 106, in transmit
    retry_strategy=self._retry_strategy,
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/google/resumable_media/requests/_helpers.py", line 136, in http_request
    return _helpers.wait_and_retry(func, RequestsMixin._get_status_code, retry_strategy)
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/google/resumable_media/_helpers.py", line 150, in wait_and_retry
    response = func()
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/google/auth/transport/requests.py", line 317, in request
    **kwargs
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', timeout('The write operation timed out',))

I also just tried installing google-cloud-storage without a virtual env under Ubuntu (just in case there was something peculiar under a virtual env), but I still get the same timeout after 60 seconds.

@vfa-minhtv
Copy link
Author

You state you ran blob.upload_from_filename("path/on/storage", "path/of/big/file/on/local"), which doesn't match the signature:
upload_from_filename(filename, content_type=None, client=None, predefined_acl=None)

Hi @crwilcox!

So sorry. I gave a wrong example. My actual code is:

image

@vfa-minhtv
Copy link
Author

@ElectricSwan

The timeout happens after 60 seconds.
I'm limited to around 800 kbps upload speed, so for me that gives a timeout for any files larger than 6 MB.
Any uploads that complete within 60 seconds are successful.

Exact same thing happen for me. My internet is also limited so 60 seconds timeout is insufficient to finish uploading

@aborzin
Copy link

aborzin commented Mar 23, 2020

@vfa-minhtv, I have been experiencing similar timeout issues on my macOS and Win platforms with google-cloud-storage==1.26.0. However, the timeout issues are inconsistent and apparently dependent on the network speed. As already mentioned in this thread, typically it fails with very slow upload speed.

I checked the code and found that any data stream of 8 MB and larger will _do_resumable_upload(..) which sends the data stream in chunks (which absolutely makes sense to support slow network connectivity):

        if size is not None and size <= _MAX_MULTIPART_SIZE:
            response = self._do_multipart_upload(
                client, stream, content_type, size, num_retries, predefined_acl
            )
        else:
            response = self._do_resumable_upload(
                client, stream, content_type, size, num_retries, predefined_acl
            )

However, the chunk size is not set in the initialization call and therefore will be set to some predefined default value:

        if chunk_size is None:
            chunk_size = self.chunk_size
            if chunk_size is None:
                chunk_size = _DEFAULT_CHUNKSIZE

This default value is set to 100 MB:

_DEFAULT_CHUNKSIZE = 104857600  # 1024 * 1024 B * 100 = 100 MB

So you must have ~ 15 MBps upload speed to complete the request within 1 min, which is apparently the default timeout (see http://www.meridianoutpost.com/resources/etools/calculators/calculator-file-download-time.php for quick upload time calculations).

I made a test and reduced the _DEFAULT_CHUNKSIZE value to 10 MB, which solved my issues.
I hope this will help and we will be able to control the chunk size based on our environment parameters.

@ElectricSwan
Copy link

Thank you @aborzin for sharing your excellent investigation.

I've edited lines 108 and 109 of
env\Lib\site-packages\google\cloud\storage\blob.py
and set the values of both to to 5 MB

_DEFAULT_CHUNKSIZE = 5 * 1024* 1024  # 5 MB
_MAX_MULTIPART_SIZE = 5 * 1024* 1024  # 5 MB

With my 800 kbps upload speed, the maximum size was 6 MB, so I chose 5 MB to provide some margin.
I can now successfully upload large files on my 800 kbps upload link.

@aborzin
Copy link

aborzin commented Mar 24, 2020

@ElectricSwan, my problem with the workaround I proposed earlier is that it was not portable (because it changes the code in the local version of google-cloud-storage lib). So I decided to override the chunk size of the blob object after it is created:

storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)

# WARNING: this is a workaround for a google-cloud-storage issue as reported on:
# https://github.com/googleapis/python-storage/issues/74
blob._chunk_size = 8388608  # 1024 * 1024 B * 16 = 8 MB

blob.upload_from_filename(source_file_name)

Even though it is a bad practice to access the "private" variable of a class, it seems to be a reasonable solution for now.

@ElectricSwan
Copy link

@aborzin, I agree wholeheartedly that it is not a good idea to change code in a library, and I do prefer your 2nd solution, but unfortunately it doesn't work for me, because of the test at line 1191;
if size is not None and size <= _MAX_MULTIPART_SIZE:

In my case (800 kbps upload), I am unable to upload 8 MB within 60 seconds. That is why I have to change the value of _MAX_MULTIPART_SIZE as well. Without that change, files between 6 MB and 8 MB still fail.

Because _MAX_MULTIPART_SIZE is a module-level variable, I can't see any way of changing that value from within my code, so for now I'm stuck with modifying the lib. Please correct me if I'm wrong.

@aborzin
Copy link

aborzin commented Mar 24, 2020

@ElectricSwan, I agree that my second solution works only if you set the chunk size to 8MB or larger because of the _MAX_MULTIPART_SIZE threshold. However I think you can override it from your code as well:

# WARNING: this is a workaround for gstorage issue as reported in:
# https://github.com/googleapis/python-storage/issues/74
storage.blob._MAX_MULTIPART_SIZE = 5 * 1024* 1024
blob._chunk_size = 5 * 1024* 1024

I debugged this option and and the threshold was set correctly to 5 MB. Of course, you can do it once, after the google.cloud.storage package is loaded (and not do it for each and every call to upload a file).

@ElectricSwan
Copy link

@aborzin, I was working on the same solution, and was just about to post it when I got your notification.

I've done;

from google.cloud import storage
# WARNING; WORKAROUND to prevent timeout for files > 6 MB on 800 kbps upload link.
storage.blob._DEFAULT_CHUNKSIZE = 5 * 1024* 1024  # 5 MB
storage.blob._MAX_MULTIPART_SIZE = 5 * 1024* 1024  # 5 MB

@ElectricSwan
Copy link

@aborzin , I found that there is a setter for chunk size on the blob object, so I've replaced the module-level
storage.blob._DEFAULT_CHUNKSIZE = 5 * 1024* 1024 # 5 MB
with
blob.chunk_size = 5 * 1024 * 1024 # Set 5 MB blob size
when I create the blob, which means 1 less access to a protected member.

This also means that, for anyone with an upload speed of at least 1.1 Mbps [1], then no change needs to be made to the library, and only the public setter needs to be used.

For anyone whose upload speed is less than 1.1 Mbps, the module level
storage.blob._MAX_MULTIPART_SIZE = 5 * 1024* 1024 # 5 MB
is still required (in addition to setting blob.chunk_size).

[1] 1.1 Mbps is the minimum required to upload 8 MB within the 60 second timeout

@HemangChothani
Copy link
Contributor

PR #185 added explicit timeout argument to the blob methods. Now users can pass a longer timeout to resolve this issue. Feel free to reopen if this issue appear again.

@sjtarik
Copy link

sjtarik commented Nov 1, 2020

I am having the exact same issue constantly while uploading a 15 mb file on 2.4mb upload speed.
when I set the timeout I still get 503 response. Strange thing is upload is actually successful, I am able to browse and verify integrity of the uploaded file on the bucket.

DEBUG:pydub.converter:subprocess output: b'video:0kB audio:19466kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.001139%'
DEBUG:google.auth._default:Checking /Users/xxx/Desktop/videotranscripts/google_voicebookmarks_service.json for explicit credentials as part of auth process...
DEBUG:google.auth._default:Checking /Users/xxx/Desktop/videotranscripts/google_voicebookmarks_service.json for explicit credentials as part of auth process...
DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
DEBUG:google.auth.transport.requests:Making request: POST https://oauth2.googleapis.com/token
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): oauth2.googleapis.com:443
DEBUG:urllib3.connectionpool:https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): storage.googleapis.com:443
DEBUG:urllib3.connectionpool:https://storage.googleapis.com:443 "GET /storage/v1/b/yyy?projection=noAcl&prettyPrint=false HTTP/1.1" 200 587
DEBUG:urllib3.connectionpool:https://storage.googleapis.com:443 "POST /upload/storage/v1/b/yyy/o?uploadType=resumable HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://storage.googleapis.com:443 "PUT /upload/storage/v1/b/yyy/o?uploadType=resumable&upload_id=ABg5-UxLKi2XLTyw1shMgubVCY3aYVHfPGjLe5gfEHEMyMlVI-HNQabmCu437hCljHU_n3QVQtc8dpCHZbrXMZq7pGw HTTP/1.1" 200 755
DEBUG:google.auth._default:Checking /Users/xxx/Desktop/videotranscripts/XXX.json for explicit credentials as part of auth process...
DEBUG:google.auth.transport.requests:Making request: POST https://oauth2.googleapis.com/token
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): oauth2.googleapis.com:443
DEBUG:urllib3.connectionpool:https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None
DEBUG:google.api_core.retry:Retrying due to , sleeping 0.8s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 2.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 2.7s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 7.8s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 1.9s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 28.7s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 21.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...

mackinleysmith added a commit to mackinleysmith/mlflow that referenced this issue Jul 8, 2021
GCS has a default chunk size of 100Mb and a default timeout per chunk of 60 seconds, so an upload speed of 13.3Mbps is required to be able to log an artifact over 100Mb with the default settings. This is discussed at length in this issue on the python-storage module of googleapis: googleapis/python-storage#74. I propose we increase the default timeout from 1 minute to 10, thereby allowing a minimum upload speed of 1.3Mbps to complete a 100Mb upload in the allotted time. At the very least I think Mlflow should accept a user override for this parameter. Thanks for reading!

Signed-off-by: MacKinley Smith <smit1625@msu.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API. needs more info This issue needs more information from the customer to proceed. priority: p2 Moderately-important priority. Fix may not be included in next release.
Projects
None yet
Development

No branches or pull requests

7 participants