Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClientError without 'Error' key #4046

Open
zalex5 opened this issue Mar 12, 2024 · 5 comments
Open

ClientError without 'Error' key #4046

zalex5 opened this issue Mar 12, 2024 · 5 comments
Assignees
Labels
bug This issue is a confirmed bug. p2 This is a standard priority issue response-requested Waiting on additional information or feedback. s3

Comments

@zalex5
Copy link

zalex5 commented Mar 12, 2024

Describe the bug

Hi,
we have an issue with S3 boto3 client apis call.

Sometimes our code generates a KeyError while parsing ClientError exceptions and the missing key is 'Error'.

The code is:

    try:
        s3_client.copy(
            CopySource={"Bucket": object_bucket, "Key": object_key},
            Bucket=DEFAULT_CACHE_BUCKET,
            Key=object_key,
            ExtraArgs={
                "MetadataDirective": "COPY",
            },
        )
    except ClientError as err:
        if err.response['Error']['Code'] in ['ObjectNotInActiveTierError', 'InvalidObjectState']:
            err_msg = f"{err.response['Error']['Code']}: the object could have been already copied"
            logger.warning(err_msg)

In cloudwatch we see this failing with the following message:

[ERROR] KeyError: 'Error'
Traceback (most recent call last):
  File "/var/task/manage_storage_class.py", line 95, in lambda_handler
    create_temporary_copy(object_bucket, object_key, s3_client=s3_client)
  File "/var/task/manage_storage_class.py", line 32, in create_temporary_copy
    if err.response['Error']['Code'] in ['ObjectNotInActiveTierError', 'InvalidObjectState']:

Just to let you know, this issue is not easily replicable: it happened 20 times in the last four weeks (considering thousands/millions of calls).

We know we can bypass the error testing for the presence of the 'Error' key in the 'response' dictionary but we couldn't find any documents mentioning that the ClientError could miss that key.

Any information about this situation would be much appreciated,
Thank you

Expected Behavior

We expect to be able to access the 'Error' key every time a ClientError is raised.

Current Behavior

Sometimes the ClientError object has no 'Error' key inside the 'response' dictionary.

Reproduction Steps

The code is very basic

import logging
import boto3
from botocore.config import Config
from botocore.exceptions import ClientError
from storage_management import DEFAULT_CACHE_BUCKET

S3_CONF = Config(retries={"max_attempts": 30, "mode": "adaptive"})
REGION = "eu-west-1"
RESTORE_DAYS = 15
RESTORE_TIER = "Bulk"

fmt = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
hnd = logging.StreamHandler()
hnd.setFormatter(fmt)

logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger.addHandler(hnd)

def create_temporary_copy(object_bucket, object_key, s3_client):
    try:
        s3_client.copy(
            CopySource={"Bucket": object_bucket, "Key": object_key},
            Bucket=DEFAULT_CACHE_BUCKET,
            Key=object_key,
            ExtraArgs={
                "MetadataDirective": "COPY",
            },
        )
    except ClientError as err:
        if err.response['Error']['Code'] in ['ObjectNotInActiveTierError', 'InvalidObjectState']:
            err_msg = f"{err.response['Error']['Code']}: the object could have been already copied"
            logger.warning(err_msg)

and also this (very similar):

import logging
import boto3
from botocore.config import Config
from botocore.exceptions import ClientError
from storage_management import DEFAULT_CACHE_BUCKET

S3_CONF = Config(retries={"max_attempts": 30, "mode": "adaptive"})
REGION = "eu-west-1"
RESTORE_DAYS = 15
RESTORE_TIER = "Bulk"

fmt = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
hnd = logging.StreamHandler()
hnd.setFormatter(fmt)

logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger.addHandler(hnd)

def change_object_storage_class(
    object_bucket, object_key, dest_storage_class, s3_client
):
    try:
        s3_client.copy(
            CopySource={"Bucket": object_bucket, "Key": object_key},
            Bucket=object_bucket,
            Key=object_key,
            ExtraArgs={
                "StorageClass": dest_storage_class,
                "MetadataDirective": "COPY",
                "TaggingDirective": "COPY",
            },
        )
    except ClientError as err:
        if err.response['Error']['Code'] in ['ObjectNotInActiveTierError', 'InvalidObjectState']:
            err_msg = f"{err.response['Error']['Code']}: the storage class has already been changed"
            logger.warning(err_msg)

Possible Solution

No response

Additional Information/Context

Both the functions are executed in a lambda triggered by EventBridge (configured to react to a storage class transition).

SDK version used

The AWS lambda default for Python 3.9 runtime

Environment details (OS name and version, etc.)

Lambda runtime Python 3.9

@zalex5 zalex5 added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Mar 12, 2024
@zalex5
Copy link
Author

zalex5 commented Mar 12, 2024

Just adding the cloudwatch logs:

log-events-viewer-result.csv

@tim-finnigan tim-finnigan self-assigned this May 16, 2024
@tim-finnigan tim-finnigan added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label May 16, 2024
@tim-finnigan
Copy link
Contributor

Thanks for reaching out and for your patience in hearing back here. Can you please provide the full Boto3 logs (with any sensitive info redacted) by adding boto3.set_stream_logger('') to your script? That will allow us to see the full XML response returned from the underlying S3 API call.

@tim-finnigan tim-finnigan added response-requested Waiting on additional information or feedback. s3 p2 This is a standard priority issue and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. needs-triage This issue or PR still needs to be triaged. labels May 16, 2024
@zalex5
Copy link
Author

zalex5 commented May 17, 2024

Hi Tim,
Unfortunately, the issue occurs randomly among millions of requests and by increasing our logs details for all the calls we would encounter more costs.
Anyway, I can provide an example of the error we get for which we printed the whole dictionary of the ClientError:

2024-05-05 21:16:03,180 - root - ERROR - Error while creating temporary copy:
{'ResponseMetadata': {'RequestId': '5Q276DN80E52RFJ5', 'HostId': 'gLapdNPbeyx2d+LvNQDqcR110zuiuDz3W1BMPaPMtuMakv3p9Zkf+MsY09pTorz57IlE7WxNsIJ6mHh4iwVRJg==', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'gLapdNPbeyx2d+LvNQDqcR110zuiuDz3W1BMPaPMtuMakv3p9Zkf+MsY09pTorz57IlE7WxNsIJ6mHh4iwVRJg==', 'x-amz-request-id': '5Q276DN80E52RFJ5', 'date': 'Sun, 05 May 2024 21:15:47 GMT', 'x-amz-server-side-encryption': 'AES256', 'content-type': 'application/xml', 'transfer-encoding': 'chunked', 'server': 'AmazonS3'}, 'RetryQuotaReached': True, 'RetryAttempts': 1}, 'ServerSideEncryption': 'AES256', 'CopyPartResult': {}}
Traceback (most recent call last):
File "/var/task/manage_storage_class.py", line 23, in create_temporary_copy
s3_client.copy(
File "/var/task/boto3/s3/inject.py", line 450, in copy
return future.result()
File "/var/task/s3transfer/futures.py", line 103, in result
return self._coordinator.result()
File "/var/task/s3transfer/futures.py", line 266, in result
raise self._exception
File "/var/task/s3transfer/tasks.py", line 139, in call
return self._execute_main(kwargs)
File "/var/task/s3transfer/tasks.py", line 162, in _execute_main
return_value = self._main(**kwargs)
File "/var/task/s3transfer/copies.py", line 370, in _main
response = client.upload_part_copy(
File "/var/task/botocore/client.py", line 565, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/task/botocore/client.py", line 1021, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (Unknown) when calling the UploadPartCopy operation: Unknown
LAMBDA_WARNING: Unhandled exception. The most likely cause is an issue in the function code. However, in rare cases, a Lambda runtime update can cause unexpected function behavior. For functions using managed runtimes, runtime updates can be triggered by a function change, or can be applied automatically. To determine if the runtime has been updated, check the runtime version in the INIT_START log entry. If this error correlates with a change in the runtime version, you may be able to mitigate this error by temporarily rolling back to the previous runtime version. For more information, see https://docs.aws.amazon.com/lambda/latest/dg/runtimes-update.html
[ERROR] KeyError: 'Error'
Traceback (most recent call last):
File "/var/task/manage_storage_class.py", line 97, in lambda_handler
create_temporary_copy(object_bucket, object_key, s3_client=s3_client)
File "/var/task/manage_storage_class.py", line 33, in create_temporary_copy
if err.response['Error']['Code'] in ['ObjectNotInActiveTierError', 'InvalidObjectState']:
END RequestId: 11dd927a-4d15-4585-9dbb-1073afc0ed12

I hope this can help

@tim-finnigan
Copy link
Contributor

Thanks @zalex5 for following up and sharing that info. What we need to investigate this issue further is the actual XML response returned from the S3 API, to determine if this is an issue with the underlying API or the parsing done on the SDK side.

So for example given this script, boto3.set_stream_logger('') is added to get the full logs:

import boto3

# Create an S3 client
s3 = boto3.client('s3')
boto3.set_stream_logger('')

# Set the source and destination bucket and object names
source_bucket_name = 'source-bucket'
source_object_name = 'test.txt'
dest_bucket_name = 'dest-bucket'
dest_object_name = 'dest-test.txt'

# Create the copy source dictionary
copy_source = {
    'Bucket': source_bucket_name,
    'Key': source_object_name
}

# Copy the object
s3.copy(copy_source, dest_bucket_name, dest_object_name)

print(f'Successfully copied {source_object_name} from {source_bucket_name} to {dest_bucket_name}/{dest_object_name}')

And then this Response body is included in the logs:

2024-05-17 08:58:48,213 botocore.parsers [DEBUG] Response body:
b'<Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>wrong-dest-bucket</BucketName><RequestId>GY991E1JTZ46EXXX</RequestId><HostId>gLhiX/Tf3uxVfNQS4tIkZ2KjCB5G3Ab7kf0nXoVl2o7eTRpZqrZi+iVyvu9G+76Vpo9GLEXXX=</HostId></Error>'

@zalex5
Copy link
Author

zalex5 commented May 17, 2024

Hi @tim-finnigan,
Unfortunately, we cannot increase the log level without incurring significant costs, considering those functions are called hundreds of thousands of times each day, and it might take weeks to re-encounter that issue.

We are also in contact with AWS relative to this issue: If they provide the necessary resources to offset the debugging costs, I will update this ticket with the requested information.

Thank you very much for your help!

@github-actions github-actions bot removed the response-requested Waiting on additional information or feedback. label May 18, 2024
@tim-finnigan tim-finnigan added the response-requested Waiting on additional information or feedback. label May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. p2 This is a standard priority issue response-requested Waiting on additional information or feedback. s3
Projects
None yet
Development

No branches or pull requests

2 participants