Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Descriptive restore error on timeout due to terminating namespace #7424

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kaovilai
Copy link
Contributor

@kaovilai kaovilai commented Feb 13, 2024

Signed-off-by: Tiger Kaovilai tkaovila@redhat.com

Thank you for contributing to Velero!

Please add a summary of your change

Make restore error descriptive when namespace being restored is in terminating state.
Any user seeing this error should know that velero does not force a namespace to disappear by removing finalizers because that could be destructive to some workloads.

User should make namespace be in a state other than terminating for Velero to continue restore process.

Does your change fix a particular issue?

Fixes #5697

Please indicate you've done the following:

  • Accepted the DCO. Commits without the DCO will delay acceptance.
  • Created a changelog file or added /kind changelog-not-required as a comment on this pull request.
  • Updated the corresponding documentation in site/content/docs/main.

@kaovilai kaovilai changed the title Descriptive restore error on terminating namespace Descriptive restore error when restoring into a terminating namespace Feb 13, 2024
@kaovilai kaovilai force-pushed the descriptiveRestoreErrorOnNamespaceTerminating branch from 73b25fd to bbd910c Compare February 13, 2024 19:50
Copy link

codecov bot commented Feb 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.71%. Comparing base (2518824) to head (e7ffa62).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7424   +/-   ##
=======================================
  Coverage   61.71%   61.71%           
=======================================
  Files         263      263           
  Lines       28869    28873    +4     
=======================================
+ Hits        17816    17819    +3     
- Misses       9793     9794    +1     
  Partials     1260     1260           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kaovilai kaovilai force-pushed the descriptiveRestoreErrorOnNamespaceTerminating branch 2 times, most recently from cf72b8a to fb25a55 Compare February 13, 2024 21:06
@kaovilai kaovilai changed the title Descriptive restore error when restoring into a terminating namespace Prevent restoring namespaces from backup as Terminating, descriptive restore error on timeout due to terminating namespace Feb 13, 2024
@kaovilai kaovilai force-pushed the descriptiveRestoreErrorOnNamespaceTerminating branch from fb25a55 to 6a72c2f Compare February 13, 2024 22:57
@kaovilai kaovilai force-pushed the descriptiveRestoreErrorOnNamespaceTerminating branch 6 times, most recently from dfe96a5 to 18bb3a0 Compare February 14, 2024 00:24
@kaovilai kaovilai force-pushed the descriptiveRestoreErrorOnNamespaceTerminating branch 2 times, most recently from 7177214 to c7b189d Compare February 14, 2024 18:51
pkg/util/kube/utils.go Outdated Show resolved Hide resolved
@kaovilai kaovilai force-pushed the descriptiveRestoreErrorOnNamespaceTerminating branch 3 times, most recently from 98ee21a to 28b5d26 Compare February 20, 2024 06:21
@kaovilai kaovilai changed the title Prevent restoring namespaces from backup as Terminating, descriptive restore error on timeout due to terminating namespace Descriptive restore error on timeout due to terminating namespace Feb 20, 2024
blackpiglet
blackpiglet previously approved these changes Feb 20, 2024
@kaovilai kaovilai force-pushed the descriptiveRestoreErrorOnNamespaceTerminating branch from 28b5d26 to ad879f3 Compare March 26, 2024 19:57
… Descriptive restore error on terminating namespace.

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

revert utils_test.go

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

address https://github.com/vmware-tanzu/velero/pull/7424/files/c7b189dd6035839c9eb8ce3dab4ead574de77adb#r1494194484

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

Update pkg/util/kube/utils.go

Signed-off-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
@kaovilai kaovilai force-pushed the descriptiveRestoreErrorOnNamespaceTerminating branch from ad879f3 to e7ffa62 Compare March 26, 2024 19:58
@mayankagg9722
Copy link

mayankagg9722 commented Mar 27, 2024

HI @kaovilai @blackpiglet we are seeing the Issue #7516. Please look into it once in the same method.

EnsureNamespaceExistsAndIsReady

During restore for every resource within the namespace, we are calling the check to await on if namespace exists (wait for 10 min polling). For instance, if we have 100 resources in a namespace that itself is in terminating state for very long then it impacts the restore flow to get stuck/halt and it too increases the time to restore as it does it for each resource in the same namespace.

@@ -97,6 +103,9 @@ func EnsureNamespaceExistsAndIsReady(namespace *corev1api.Namespace, client core

// err will be set if we timed out or encountered issues retrieving the namespace,
if err != nil {
if terminatingNamespace {
return false, nsCreated, errors.Wrapf(err, "timed out waiting for terminating namespace %s to disappear before restoring", namespace.Name)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this call is used during restore as well, and if such error occur for the resource during restore, can we also add this to restore warnings list?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention #7516

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is warning preferred over restore error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also per PR title, the intent is that the error are visible in restore error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changes do you see needed to be made to the PR?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry on the late reply, I was awk.
Yes, overall its good to have this error that you added and its visible during restore

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reducing the time to address the issue #7516 we need to look at other options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restore gets stuck if the destination namespace is in Terminating phase
5 participants