Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set rollout error state for failed creation, starting or deletion process #1359

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

herdt-michael
Copy link
Contributor

Set rollout into error state in case of too many tries of starting, creating or deleting a rollout.

@hawkbit-bot
Copy link

Can one of the admins verify this patch?

@stefbehl stefbehl added this to the 0.3.0M9 milestone Apr 27, 2023
final Iterable<JpaAction> iterable = scheduledActions::iterator;
final List<Long> actionIds = StreamSupport.stream(iterable.spliterator(), false).map(Action::getId)
.collect(Collectors.toList());
actionRepository.deleteByIdIn(actionIds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please consider splitting action ids in the batches of 999(max entries in IN statement), because the TRANSACTION_ACTIONS is 5000

rollout.setRetryCount(retryCount);
rolloutRepository.save(rollout);

if (retryCount >= maxRetries) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider moving the check up, so that we don't increase the retry count and don't perform the execution of rollout handling if the retry count is exceeded (potentially check the status of a rollout, if it is in Error_* do nothing)

@avgustinmm
Copy link
Contributor

@herdt-michael,
is this PR something that is to be processed? There was some time without a progress, and there are unresolved reviews.
Anyway, what is the motivation of this feature? An rollout start/delete could fail many times because of temporary DB failures/overload or whatever. Do we need to put it in error state? What are the cases when rollout isn't recoverable?
Could you provide a little bit motivation for the PR.

@avgustinmm avgustinmm removed this from the 0.3.0M9 milestone Nov 16, 2023
@herdt-michael
Copy link
Contributor Author

@herdt-michael, is this PR something that is to be processed? There was some time without a progress, and there are unresolved reviews. Anyway, what is the motivation of this feature? An rollout start/delete could fail many times because of temporary DB failures/overload or whatever. Do we need to put it in error state? What are the cases when rollout isn't recoverable? Could you provide a little bit motivation for the PR.

Hi @avgustinmm , please excuse the late reply.

In cases where an exception has occurred during the handling of a rollout creation, with the next scheduler run the same rollout is processed again. The issue arises because the old scheduler run was not fully completed and ended up in an unexpected state. In some cases, this resulted in a mismatch between the target count and the targets actually included. With an error handling concept, for example, the creation of a rollout should be stopped with an error state rather than "forcing" trying to create a possibly corrupt rollout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants