Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call StepFunction SendTaskSuccess once then got: Task Timed Out: 'Provided task does not exist anymore', but succeeded eventually #4044

Closed
NianLi71 opened this issue Mar 6, 2024 · 2 comments
Assignees
Labels
bug This issue is a confirmed bug. closed-for-staleness p2 This is a standard priority issue response-requested Waiting on additional information or feedback. service-api This issue is caused by the service API, not the SDK implementation. stepfunctions

Comments

@NianLi71
Copy link

NianLi71 commented Mar 6, 2024

Describe the bug

I have code that consumes messages from SQS stand queue, each message will make a call to StepFunction SendTaskSuccess. I checked log that one message only called StepFunction SendTaskSuccess once with valid task token and got:

Failed to process with exception: An error occurred (TaskTimedOut) when calling the SendTaskSuccess operation: Task Timed Out: 'Provided task does not exist anymore'

I also saved the message id in DynamoDB, it was the same message that updated the DB item which caused TaskTimedOut
Looks like boto3 had made the first attempt to send StepFunction task token and the token expired, but eventually the SendTaskSuccess operation succeeded even with exception like above, and StepFunction successfully received the task token.

Any boto3 inside retry mechanism leads to this issue?

Expected Behavior

Should be no exception:

Failed to process with exception: An error occurred (TaskTimedOut) when calling the SendTaskSuccess operation: Task Timed Out: 'Provided task does not exist anymore'

when making first call

Current Behavior

Got exception

Failed to process with exception: An error occurred (TaskTimedOut) when calling the SendTaskSuccess operation: Task Timed Out: 'Provided task does not exist anymore'

even with first call to SendTaskSuccess, but eventually the SendTaskSuccess operation succeeded.

Reproduction Steps

The error was random, I tried to send about greater than 10K requests then there was one TaskTimedOut exception.

Possible Solution

No response

Additional Information/Context

No response

SDK version used

Boto3==1.34.50, BotoCore==1.34.50

Environment details (OS name and version, etc.)

AWS Lambda Python 3.9 x86_64

@NianLi71 NianLi71 added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Mar 6, 2024
@NianLi71 NianLi71 changed the title Call StepFunction SendTaskSuccess once then got: Task Timed Out: 'Provided task does not exist anymore', but succeeded at last Call StepFunction SendTaskSuccess once then got: Task Timed Out: 'Provided task does not exist anymore', but succeeded eventually Mar 6, 2024
@tim-finnigan tim-finnigan self-assigned this May 9, 2024
@tim-finnigan tim-finnigan added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label May 9, 2024
@tim-finnigan
Copy link
Contributor

Hello - thanks for reaching out and for your patience here. The send_task_status command involves a call to the underlying SendTaskStatus API. Therefore if there's an issue with the behavior here then it's likely something we'd need to escalate to the Step Functions team.

The TaskTimedOut exception indicates:

The task token has either expired or the task associated with the token has already been closed.

As a service exception this would be caught with the ClientError Botocore exception. The retry behavior depends on however you have configured retries.

If the error was random and only 1 in 10k as you mentioned, then this may have just been caused by something transient like a network issue. Maybe get_execution_history would help provide more context. For us to investigate this further I think we need a code snippet to reproduce the issue, and debug logs (using boto3.set_stream_logger('') to get more insight into what's going on.

@tim-finnigan tim-finnigan added response-requested Waiting on additional information or feedback. service-api This issue is caused by the service API, not the SDK implementation. p2 This is a standard priority issue stepfunctions and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. needs-triage This issue or PR still needs to be triaged. labels May 9, 2024
Copy link

Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. closed-for-staleness p2 This is a standard priority issue response-requested Waiting on additional information or feedback. service-api This issue is caused by the service API, not the SDK implementation. stepfunctions
Projects
None yet
Development

No branches or pull requests

2 participants