-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v3: Refactor attempt creation to be worker requested #1077
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
🦋 Changeset detectedLatest commit: 2099d91 The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
ericallam
force-pushed
the
v3/worker-attempt-creation
branch
from
May 1, 2024 09:42
c822c89
to
1bba5d5
Compare
Closed
This reverts commit d137e4e.
nicktrn
approved these changes
May 30, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work everyone 🤝
jacobparis
pushed a commit
to jacobparis/trigger.dev
that referenced
this pull request
Jun 1, 2024
) * WIP worker TaskRunAttempt creation * Handling failing task runs that cannot create an attempt for whatever reason * Move the visibility queue stuff into a graphile job * Fixed task runs with unsanitized queue names * “Borrow” the code from alerts PR to get self hosted deployments working * Add an admin API endpoint to get info about the shared marqs queue * Allow admins to view any project metrics * start adding lazy attempts to prod * lazy attempt creation for prod workers * resurrect prod stack traces * add exception event to failed run spans * simplify dependency resumes * fix typecheck * fix merge * fresh process for all attempts * always try sigterm first * stop heartbeat timeout on non-inplace replace message * add missing ack on checkpoint creation service failure * bypass dequeue for retries with running worker * respect retry delays * crash runs with invalid run status for execution * remove debug logs * fix nack message * fix version locking * fresh attempt processes in dev and prod * improve handling of ipc timeouts * consider checkpoint failures on cancellation * add basic chaos monkey to checkpointer * changeset * control forced checkpoint simulation via env var * fix merge * kill old attempt processes before checkpointing * detailed perf logging for checkpointing * add coordinator otlp endpoint example * improve prod run cancellation * rename supports lazy attempts migration * fix graceful exit * fix retry mechanics * clear paused state before retry * remove checkpoint image after push * crash worker on unrecoverable errors * refactor unrecoverable error emit * switch to do hosted busybox image * increase wait for duration ipc timeout * add changeset for misc fixes * fix merge * fix retry delay span runId * fix dev retries * improve prod worker logging * log checkpoint sizes * add lazy attempts catalog entries * Fixed merge issue: use zodFetch, not wrapZodFetch * Revert "Fixed merge issue: use zodFetch, not wrapZodFetch" This reverts commit d137e4e. * importEnvVars uses wrapZodFetch now * add backwards compat for retries without checkpoints * handle more cases of unrecoverable runs * don't kill the child process if it shouldn't be killed --------- Co-authored-by: nicktrn <55853254+nicktrn@users.noreply.github.com> Co-authored-by: Matt Aitken <matt@mattaitken.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes
Lazy attempt creation:
Other changes:
Testing checklist
There have been many changes to what happens after attempt completion and before retries. It's not enough to test that tasks complete successfully. In all scenarios, failure needs to be tested as well, 3-4 retries should be enough. This will also ensure we test for memory leaks, particularly when combined with checkpoints.
General guidelines:
All relevant catalog entries start with
lazy-
and the following payload format can be used with all of them:New SDK
Dev
Prod
Old SDK
Dev
Prod