Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Exponential backoff retry #2333

Open
2 tasks done
mkao006 opened this issue Apr 8, 2022 · 9 comments · May be fixed by #5263 or flyteorg/flytekit#2368
Open
2 tasks done

[Core feature] Exponential backoff retry #2333

mkao006 opened this issue Apr 8, 2022 · 9 comments · May be fixed by #5263 or flyteorg/flytekit#2368
Labels
backlogged For internal use. Reserved for contributor team workflow. enhancement New feature or request exo flytepropeller size:L This PR changes 100-499 lines, ignoring generated files.

Comments

@mkao006
Copy link

mkao006 commented Apr 8, 2022

Motivation: Why do you think this is important?

Sometimes there are downtimes for an upstream system which causes the workflow to fail, the workflow would succeed after retrying at a later time.

The limitation with the current retry is the maximum count and lacking the flexibility to set the interval thus the retry is exhausted before the upstream system recover from downtime.

Goal: What should the final outcome look like, ideally?

The retry will be performed on an increased interval [1m, 3m, 10m, 30m etc] up to the number of retries.

Describe alternatives you've considered

No alternatives are available at the moment.

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@mkao006 mkao006 added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Apr 8, 2022
@welcome
Copy link

welcome bot commented Apr 8, 2022

Thank you for opening your first issue here! 🛠

@github-actions
Copy link

Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏

@github-actions github-actions bot added the stale label Aug 29, 2023
@hamersaw
Copy link
Contributor

Commenting to keep open.

@github-actions github-actions bot removed the stale label Aug 31, 2023
@hamersaw hamersaw added exo flytepropeller backlogged For internal use. Reserved for contributor team workflow. and removed untriaged This issues has not yet been looked at by the Maintainers labels Dec 5, 2023
@kumare3
Copy link
Contributor

kumare3 commented Mar 23, 2024

i really like this idea.
I think from a flytekit pov, it should be implemented like so

# retries is simply implemented using a retry class, similar to container_image=ImageSpec or str
@task(retries=BackoffRetry(interval=timedelta(minutes=2), count=3))
def foo():
   ...

In flyteidl, we can add it to RetryStrategy which is already a class.

In Flytepropeller we can add it to
Propeller's struct

and then to the computation logic here

We can use the last_updated_at from the node_status to wait for the retry.

This should not be too hard to do.
We can also implement exponential backoff

@kumare3 kumare3 added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 23, 2024
@kdhingra307
Copy link

@kumare3 will it be possible to add a feature where a user can provide the delay for retry?

It is useful in case upstream system provides the delay mechanism

@kumare3
Copy link
Contributor

kumare3 commented Apr 16, 2024

Yup absolutely- this is indeed the idea with retry strategy and implementation should be easy. But someone has to be contribute

@kumare3
Copy link
Contributor

kumare3 commented Apr 16, 2024

@kdhingra307 join slack.flyte.org and we can collaborate

@kdhingra307
Copy link

@kumare3 sure let's connect on slack once

@kumare3
Copy link
Contributor

kumare3 commented Apr 17, 2024

As discussed if you just want the workflow to sleep / wait before task B, you can use Flytekit.sleep

Also nodes will be queued in case no resources are available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlogged For internal use. Reserved for contributor team workflow. enhancement New feature or request exo flytepropeller size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
4 participants