Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'ForkAwareLocal' object has no attribute 'connection' for multithreaded cp #1100

Open
djc opened this issue Sep 9, 2020 · 9 comments
Open

Comments

@djc
Copy link

djc commented Sep 9, 2020

When running gsutil -m cp -r gs://example/ ./ on a fairly large folder on macOS with the system Python 3.7 (to prevent the issues from #961), I see many instances of the following error:

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 788, in _callmethod
    conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/command.py", line 2348, in run
    cls = copy.copy(class_map[caller_id])
  File "<string>", line 2, in __getitem__
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 792, in _callmethod
    self._connect()
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 779, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 492, in Client
    c = SocketClient(address)
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 619, in SocketClient
    s.connect(address)
ConnectionRefusedError: [Errno 61] Connection refused

As a result the command seems to hang after starting a first batch of downloads.

@peinan
Copy link

peinan commented Sep 14, 2020

I have the exactly same problem in OSX 10.15.6, gcloud {308.0.0, 297.0.1}, gsutil {4.53, 4.51}, python 3.7.8.

@waltaskew
Copy link

Same in OSX 10.15.7, gcloud 303.0.0, gsutil 4.52, python 3.6.5

@mobuchowski
Copy link

Same in gcloud 317.0.0, python 3.7.7

@rrauber
Copy link
Contributor

rrauber commented Nov 9, 2020

Thanks for reporting this! I was able to reproduce this bug as well and found that disabling multiprocessing helped. You can do this by setting parallel_process_count=1 in the GSUtil section of your boto config file, or by adding the following flag to your command: -o "GSUtil:parallel_process_count=1". Though this disables multiprocessing, multithreading should still be enabled, so you'll still be able to parallelize your transfers.

I'm guessing this issue is related to more general issues with multiprocessing on MacOS that PR #1107 left us vulnerable to (#1107 (comment)). If you're still having this issue after disabling multiprocessing please let us know!

@alamothe
Copy link

alamothe commented Dec 1, 2020

Thank you for providing a workaround, it works for me.

This tool is a joke though. Not a single version since 297 works out of the box without some kind of patching.

@dweekly
Copy link

dweekly commented Feb 10, 2021

The -o "GSUtil:parallel_process_count=1" workaround works for me on Big Sur (macOS 11.2) but it's frustrating to me that this continues to persist as an issue on the Mac platform.

@danielyaa5
Copy link

danielyaa5 commented Apr 5, 2021

Doesnt work for me, I get CommandException: Destination URL must name a directory, bucket, or bucket subdirectory for the multiple source form of the cp command. please fix your joke product

@gcarr1020
Copy link

I ran into the same CommandException: Destination URL must name a directory, bucket, or bucket subdirectory for the multiple source form of the cp command.issue on OSX 12.0 Beta (M1 Mac). I fixed it by removing the -m flag. So the command was gsutil cp -r gs://<path/to/bucket_or_sub_bucket> .. Unfortunately, this removes parallelism and significantly affects performance.

gerhard added a commit to gerhard/io that referenced this issue Nov 7, 2022
GoogleCloudPlatform/gsutil#1100

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
@nrempel
Copy link

nrempel commented Feb 16, 2023

If you're here, I recommend the new gcloud storage cp commend: https://cloud.google.com/sdk/gcloud/reference/storage/cp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants