Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3.Object.copy() fails with multipart and ChecksumAlgorithm #241

Open
rapkyt opened this issue Jun 7, 2022 · 3 comments
Open

S3.Object.copy() fails with multipart and ChecksumAlgorithm #241

rapkyt opened this issue Jun 7, 2022 · 3 comments

Comments

@rapkyt
Copy link
Contributor

rapkyt commented Jun 7, 2022

Describe the bug

If you try to copy an object with multipart and create checksums for the destination object it will fail.

Note: using copy with small objects doesn't fail, it does fail in objects whose size is above multipart_threshold.
Note 2: using copy from s3 client has the same effect.

Expected Behavior

copy method should work for multipart object with checksums.

Current Behavior

Running copy with multipart and ChecksumAlgorithm set to SHA256 throws the message:

botocore.exceptions.ClientError: An error occurred (InvalidRequest) when calling the CompleteMultipartUpload operation: The upload was created using a sha256 checksum. The complete request must include the checksum for each part. It was missing for part 1 in the request.

Reproduction Steps

Run the following code replacing bucket and key accordingly

import boto3
s3= boto3.resource("s3")

dest_bucket = "bucket"
dest_key = "key"
copy_source = {"Bucket": "bucket", "Key": "key"}

s3.Object(dest_bucket, dest_key).copy(
    CopySource=copy_source,
    ExtraArgs={"ChecksumAlgorithm": "SHA256"}
)

and you'll get the following error
Full traceback:

 Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File ".venv/lib/python3.7/site-packages/boto3/s3/inject.py", line 572, in object_copy
     Config=Config,
   File ".venv/lib/python3.7/site-packages/boto3/s3/inject.py", line 444, in copy
     return future.result()
   File ".venv/lib/python3.7/site-packages/s3transfer/futures.py", line 103, in result
     return self._coordinator.result()
   File ".venv/lib/python3.7/site-packages/s3transfer/futures.py", line 266, in result
     raise self._exception
   File ".venv/lib/python3.7/site-packages/s3transfer/tasks.py", line 139, in __call__
     return self._execute_main(kwargs)
   File ".venv/lib/python3.7/site-packages/s3transfer/tasks.py", line 162, in _execute_main
     return_value = self._main(**kwargs)
   File ".venv/lib/python3.7/site-packages/s3transfer/tasks.py", line 387, in _main
     **extra_args,
   File ".venv/lib/python3.7/site-packages/botocore/client.py", line 508, in _api_call
     return self._make_api_call(operation_name, kwargs)
   File ".venv/lib/python3.7/site-packages/botocore/client.py", line 911, in _make_api_call
     raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidRequest) when calling the CompleteMultipartUpload operation: The upload was created using a sha256 checksum. The complete request must include the checksum for each part. It was missing for part 1 in the request.

Possible Solution

The problem resides inside CopyPartTask (s3transfer/copies.py) which doesn't return the checksum if it's in the response, grabbing the checksum from the request and adding it to the return statement fixes this issue

Additional Information/Context

Looking at the CompleteMultipartUpload request looks like it sends ETAG for each part, but not checksum.

Lib version:

  • botocore==1.27.1
  • boto3==1.24.1
  • s3transfer==0.6.0

SDK version used

1.24.1

Environment details (OS name and version, etc.)

MacOS 12.4 Python 3.7

@boonjiashen
Copy link

Hi @rapkyt, I'm a developer who's also using this copy method. If we do not specify ChecksumAlgorithm, does copy not perform checksum validation for multipart uploads? Or is there a default algorithm when there's none specified?

s3.Object(dest_bucket, dest_key).copy(
    CopySource=copy_source,
    # ExtraArgs={"ChecksumAlgorithm": "SHA256"}  # What's the behavior for multipart uploads without this line?
)

@rapkyt
Copy link
Contributor Author

rapkyt commented Jun 24, 2022

@boonjiashen, If you don't specify ChecksumAlgorithm then your s3 object will not have additional checksums. AFIK boto does some sort of checksum validation with the Etags already.

One thing that cough my mind is that, when uploading or copying a multipart object, the checksum of the s3 object is not the checksum of the whole s3 file, but rather, the checksum of concatenating the checksums of each part.

@sat-ch
Copy link

sat-ch commented Oct 20, 2022

Note: A PR has been raised to solve this issue.
#242

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants