-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Lambda async invocation queue namespace #10831
Conversation
S3 Image Test Results (AMD64 / ARM64) 2 files 2 suites 3m 12s ⏱️ Results for commit 6445b95. ♻️ This comment has been updated with latest results. |
LocalStack Community integration with Pro 2 files ± 0 2 suites ±0 1h 40m 16s ⏱️ + 3m 26s Results for commit 6445b95. ± Comparison against base commit 9d80628. This pull request removes 24 and adds 62 tests. Note that renamed tests count towards both.
This pull request skips 4 and un-skips 3 tests.
♻️ This comment has been updated with latest results. |
removing myself as reviewer as i was added only because of the config.py change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good solution for now.
We should definitely tackle the lambda function deletion locks and the aborted invocations soon!
I just had some minor questions regarding some comments, otherwise approved from my side :)
@@ -599,6 +599,25 @@ class Function: | |||
def latest(self) -> FunctionVersion: | |||
return self.versions["$LATEST"] | |||
|
|||
# HACK to model a volatile variable that should be ignored for persistence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also not 100% satisfied with this hack, but I think for now it is the best solution. Let's hope we can find a way to get by without this hack in the future :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, pretty ugly hack 😢
We currently don't have a Python object such as the VersionManager at function level. Alternative hacks:
- Use a dict to track them in the lambda service, but passing it down into the event managers is gonna be awkward
- Add it to the version manager of a specific version (e.g., v1, $LATEST is annoying because then it needs to migrate)
# Make it very likely that the invocation is being processed | ||
# In LocalStack, Lambda image pulling could potentially affect the exact timing, | ||
# but at least the invocation runtime startup should be in progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is misleading. Image pulling should have already taken place during the transition to Active. This is maybe just for the container creation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. I updated the comment.
Acquiring the lock is actually pretty fast for invokes 🚀 Container creation happens later and does not matter because we always send invokes to environments we spawn on demand (i.e., no optimization step here).
Payload=json.dumps(payload), | ||
) | ||
# Make it very likely that the first request is being processed before sending further requests without sleeps. | ||
# We could use a side effect (e.g., via S3) to signal a status update if Lambda image pulling causes flakiness, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here regarding pulling vs container creation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated as well 👍
@@ -41,7 +41,7 @@ | |||
from localstack.constants import AWS_REGION_US_EAST_1 | |||
from localstack.services.lambda_.api_utils import qualified_lambda_arn, unqualified_lambda_arn | |||
from localstack.utils.archives import unzip | |||
from localstack.utils.strings import long_uid | |||
from localstack.utils.strings import long_uid, short_uid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@giograno Does this hack invalidate all Lambda CloudPods due to a changed import 🤔
Motivation
Addresses #10280
Re-creating a Lambda function immediately after deleting it, triggers a race condition and deletes the internal async invoke queue. This causes continuous error logs and breaks async invokes.
The namespace clash also causes further issues because queued invokes can get deleted.
Changes
test_recreate_function
to reproduce the reported function recreate issuetest_function_update_during_invoke
to validate the AWS behavior of function updates during a running invocation => Currently broken on LocalStacktest_async_invoke_queue_upon_function_update
to validate the queuing behavior of async invokes when a function update happens => kinda works but the first invoke fails and gets retried due function updates aborting all running invokes (even for $LATEST)Unrelated changes sneaked in:
lambda_service
backlink in Lambda service manager (missing declaration cleanup from this PR)Testing
State pickling testing whether the
instance_id
attribute is persisted or not:tests.aws.services.lambda_.test_lambda.TestLambdaCleanup.test_recreate_function
with a breakpoint at the end of the testslocalstack-pro state export --format json lambda-state
Discussion