[Core feature] Exponential backoff retry #2333

mkao006 · 2022-04-08T08:37:35Z

Motivation: Why do you think this is important?

Sometimes there are downtimes for an upstream system which causes the workflow to fail, the workflow would succeed after retrying at a later time.

The limitation with the current retry is the maximum count and lacking the flexibility to set the interval thus the retry is exhausted before the upstream system recover from downtime.

Goal: What should the final outcome look like, ideally?

The retry will be performed on an increased interval [1m, 3m, 10m, 30m etc] up to the number of retries.

Describe alternatives you've considered

No alternatives are available at the moment.

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

Yes

Have you read the Code of Conduct?

Yes

welcome · 2022-04-08T08:37:36Z

Thank you for opening your first issue here! 🛠

github-actions · 2023-08-29T00:38:26Z

Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏

hamersaw · 2023-08-30T15:32:30Z

Commenting to keep open.

kumare3 · 2024-03-23T00:00:25Z

i really like this idea.
I think from a flytekit pov, it should be implemented like so

# retries is simply implemented using a retry class, similar to container_image=ImageSpec or str
@task(retries=BackoffRetry(interval=timedelta(minutes=2), count=3))
def foo():
   ...

In flyteidl, we can add it to RetryStrategy which is already a class.

In Flytepropeller we can add it to
Propeller's struct

and then to the computation logic here

We can use the last_updated_at from the node_status to wait for the retry.

This should not be too hard to do.
We can also implement exponential backoff

kdhingra307 · 2024-04-16T07:33:36Z

@kumare3 will it be possible to add a feature where a user can provide the delay for retry?

It is useful in case upstream system provides the delay mechanism

kumare3 · 2024-04-16T14:20:04Z

Yup absolutely- this is indeed the idea with retry strategy and implementation should be easy. But someone has to be contribute

kumare3 · 2024-04-16T14:20:59Z

@kdhingra307 join slack.flyte.org and we can collaborate

kdhingra307 · 2024-04-16T19:09:49Z

@kumare3 sure let's connect on slack once

kumare3 · 2024-04-17T14:14:21Z

As discussed if you just want the workflow to sleep / wait before task B, you can use Flytekit.sleep

Also nodes will be queued in case no resources are available

mkao006 added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Apr 8, 2022

github-actions bot added the stale label Aug 29, 2023

github-actions bot removed the stale label Aug 31, 2023

hamersaw added exo flytepropeller backlogged For internal use. Reserved for contributor team workflow. and removed untriaged This issues has not yet been looked at by the Maintainers labels Dec 5, 2023

kumare3 added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 23, 2024

This was referenced Apr 22, 2024

Add retry-with-delay to task metadata flyteorg/flytekit#2368

Draft

Implement task retry with delay #5263

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core feature] Exponential backoff retry #2333

[Core feature] Exponential backoff retry #2333

mkao006 commented Apr 8, 2022

welcome bot commented Apr 8, 2022

github-actions bot commented Aug 29, 2023

hamersaw commented Aug 30, 2023

kumare3 commented Mar 23, 2024

kdhingra307 commented Apr 16, 2024

kumare3 commented Apr 16, 2024

kumare3 commented Apr 16, 2024

kdhingra307 commented Apr 16, 2024

kumare3 commented Apr 17, 2024

[Core feature] Exponential backoff retry #2333

[Core feature] Exponential backoff retry #2333

Comments

mkao006 commented Apr 8, 2022

Motivation: Why do you think this is important?

Goal: What should the final outcome look like, ideally?

Describe alternatives you've considered

Propose: Link/Inline OR Additional context

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

welcome bot commented Apr 8, 2022

github-actions bot commented Aug 29, 2023

hamersaw commented Aug 30, 2023

kumare3 commented Mar 23, 2024

kdhingra307 commented Apr 16, 2024

kumare3 commented Apr 16, 2024

kumare3 commented Apr 16, 2024

kdhingra307 commented Apr 16, 2024

kumare3 commented Apr 17, 2024