Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not return after logging "Error identifying update scenario." #599

Closed
takuhiro opened this issue Jul 2, 2021 · 5 comments
Closed

Comments

@takuhiro
Copy link

takuhiro commented Jul 2, 2021

Describe the bug
We found the code that does not return after receiving error.
https://github.com/NetApp/trident/blob/stable/v21.04/operator/controllers/orchestrator/controller.go#L1177-L1184
In that case currentInstalledTridentVersion and tridentK8sConfigVersion are empty, but these variables are used in the following code.
I'm not sure this is a bug, but if not returning after error, the variables might be used unexpectedly.

Environment

  • Trident version: v21.04
@takuhiro takuhiro added the bug label Jul 2, 2021
@rohit-arora-dev
Copy link
Contributor

Hello @takuhiro,

Thank you for this query.
Failure to get currentInstalledTridentVersion and tridentK8sConfigVersion from existing deployments or daemonset means something is wrong e.g. this information is not set properly, missing, has been tempered, possibly both deployment and daemonset are not present, basically, something is wrong.

An ideal thing to do would be to re-create these objects. Now there are two empty variables here:

  1. tridentK8sConfigVersion, when this is returned empty, Trident fails to identify if this is an upgrade scenario or not. So, in this scenario, your Trident installation is not marked for upgrade (here).
  2. currentInstalledTridentVersion, this variable is used for setting the version in the CR status, and used exactly once in the installer.go for comparing existing trident version with the version of the trident offered by the image (here). In this scenario this condition should fail and mark Trident for re-installation.

The expectation here is that the re-installation would self-heal the deployment and the daemonset such that in the next iteration version information can be properly retrieved.

@takuhiro
Copy link
Author

takuhiro commented Jul 5, 2021

@ntap-arorar Thank you for the detailed answer. I understood the logic when tridentK8sConfigVersion and currentInstalledTridentVersion are empty and that would be resolved in the next iteration.

In our case, the complete error message was this. (I'm sorry I should have include the context first.)

time="2021-06-26T14:56:52Z" level=error msg="Error identifying update scenario." controllingCR=trident err="unable to get list of deployments"

The error unable to get list of deployments was made here. The root cause of the error was probably a temporary GET request error to api-server here, which could lead unnecessary re-installation. In this case, I think it would be better to return and retry in the next iteration.

@rohit-arora-dev
Copy link
Contributor

I believe there is a scope of making this logic more resilient to unintended re-installs due to temporary issues (false positive) in the Kubernetes environment. This does depend on the error type being returned otherwise the Operator may miss on healing real Trident installations that are in bad shape.

@takuhiro
Copy link
Author

takuhiro commented Jul 8, 2021

I agree that the error should be handled by the error type. If the error type represents temporary issues, the controller can skip re-installation because the error would be resolved soon in the next iteration.

@gnarl gnarl added the tracked label Jul 19, 2021
@gnarl
Copy link
Contributor

gnarl commented Oct 26, 2021

This issue is fixed with commit bf43f8 and is included in the Trident 21.10.0 release.

@gnarl gnarl closed this as completed Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants