Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

actionqueue should perform retries for failed actions without being prompted by control data changes #1708

Open
RebeccaMahany opened this issue May 6, 2024 · 2 comments

Comments

@RebeccaMahany
Copy link
Contributor

The actionqueue handles changes to the actions subsystem. When it successfully performs an action, it marks the action as performed and stores that information. Storing the performed actions allows the control server to safely send down the same action (with the same ID) an unlimited number of times -- the actionqueue will only perform that action if it wasn't able to perform it successfully previously, which implements retries for actions.

We think that this retry mechanism is not sufficient (especially for some of our more important actions like remote uninstall), since it relies on the control server changing the data in the actions subsystem. (E.g., if a remote uninstall action failed to process once, launcher would not retry until it received a new remote uninstall action, or a new notification, etc.)

We would like to update launcher's behavior so that the actionqueue will retry received-but-failed actions proactively, without requiring a prompt from the control server.

Requirements for implementation:

  • If any action fails, the actionqueue should retry it at a reasonable interval (once an hour?)
  • The actionqueue should never retry actions that have already been completed successfully
  • The actionqueue should never retry actions that are expired
@directionless
Copy link
Contributor

Probably retry closer to a minute than an hour. Not quite sure on the timer

@RebeccaMahany
Copy link
Contributor Author

I wonder if just returning an error from func (aq *ActionQueue) Update(data io.Reader) error if any action fails to process would do it? Then the updated data for that subsystem won't be stored, so Update would be called again in 1 minute by the control system, effectively handling the retry for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants