New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core feature] Exponential backoff retry #2333
[Core feature] Exponential backoff retry #2333
Comments
Thank you for opening your first issue here! 🛠 |
Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏 |
Commenting to keep open. |
i really like this idea.
In flyteidl, we can add it to RetryStrategy which is already a class. In Flytepropeller we can add it to and then to the computation logic here We can use the last_updated_at from the node_status to wait for the retry. This should not be too hard to do. |
@kumare3 will it be possible to add a feature where a user can provide the delay for retry? It is useful in case upstream system provides the delay mechanism |
Yup absolutely- this is indeed the idea with retry strategy and implementation should be easy. But someone has to be contribute |
@kdhingra307 join slack.flyte.org and we can collaborate |
@kumare3 sure let's connect on slack once |
As discussed if you just want the workflow to sleep / wait before task B, you can use Flytekit.sleep Also nodes will be queued in case no resources are available |
Motivation: Why do you think this is important?
Sometimes there are downtimes for an upstream system which causes the workflow to fail, the workflow would succeed after retrying at a later time.
The limitation with the current retry is the maximum count and lacking the flexibility to set the interval thus the retry is exhausted before the upstream system recover from downtime.
Goal: What should the final outcome look like, ideally?
The retry will be performed on an increased interval [1m, 3m, 10m, 30m etc] up to the number of retries.
Describe alternatives you've considered
No alternatives are available at the moment.
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: