Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cloud2edge's post-install running before its deps are available #404

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

refi64
Copy link
Contributor

@refi64 refi64 commented Sep 9, 2022

post-install runs when the chart is deployed, not when it's actually ready, so it's entirely possible for the post-install hook to try to access services that have not even started yet, resulting in a deployment failure. Instead, wait for the needed services to be ready, following the same pattern as used by the nginx-deployment template.

Signed-off-by: Ryan Gonzalez ryan.gonzalez@collabora.com

post-install runs when the chart is deployed, not when it's actually
ready, so it's entirely possible for the post-install hook to try to
access services that have not even started yet, resulting in a
deployment failure. Instead, wait for the needed services to be ready,
following the same pattern as used by the nginx-deployment template.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
@refi64
Copy link
Contributor Author

refi64 commented Sep 9, 2022

I'm not sure why it's not saying the ECA isn't signed, it is under the correct email.

@calohmn calohmn added the Cloud2Edge Improvements to the Cloud2Edge package label Sep 14, 2022
@calohmn
Copy link
Contributor

calohmn commented Sep 14, 2022

@refi64 I couldn't reproduce such a case of a post-install job running too early. In my tests, Hono and cloud2edge post-install jobs were only started after all pods were ready. And the readiness probe endpoint implementations in the Hono device registry deployment and the ditto nginx deployment should make sure that at that point the components are operational.

Did you have issues with the cloud2edge post-install jobs? Or was it a theoretical motivation, expecting potential issues (the Helm post-install definition of "Executes after all resources are loaded into Kubernetes" doesn't seem very clear after all) ?

@refi64
Copy link
Contributor Author

refi64 commented Sep 14, 2022

Did you have issues with the cloud2edge post-install jobs? Or was it a theoretical motivation, expecting potential issues (the Helm post-install definition of "Executes after all resources are loaded into Kubernetes" doesn't seem very clear after all) ?

The docs here are terrible but it's clarified a bit here:

Helm defines two hooks for the install lifecycle: pre-install and post-install. If the developer of the foo chart implements both hooks, the lifecycle is altered like this:
[...]
The library loads the resulting resources into Kubernetes. *Note that if the --wait flag is set, the library will wait until all resources are in a ready state and will not run the post-install hook until they are ready.
The library executes the post-install hook (loading hook resources)

So it seems like the --wait argument (confusingly) does impact this, and if it's not given, the pods won't be ready yet.

@calohmn
Copy link
Contributor

calohmn commented Sep 20, 2022

That's really quite confusing and unexpected. (It's unfortunate there is no "post-ready" hook yet.)
Actually, the README of the Hono chart was adapted a while back to take this into account (see eea5837).
Making the chart deployment not require the --wait argument would still be good, I think.

Regarding the changes in this PR:
Since a Hono chart file is also changed here, there should be 2 PRs, one for the Hono chart, one for the cloud2edge chart, each also increasing the respective chart version (and the cloud2edge chart then requiring the updated Hono chart version).

Regarding the Hono change, I think it isn't needed for the health port to be exposed on the external service. The port definition could be added to hono-service-device-registry-svc.yaml instead, dropping the -ext service name suffix in the URL used in the cloud2edge post-install-job. As path for that URL, readiness would fit better than liveness.
And in the Hono chart, the hono-service-device-registry-post-install-job.yaml job should be adapted as well, also waiting for device-registry readiness.

@sgloutnikov
Copy link

sgloutnikov commented Feb 2, 2024

I see this was brought up a while back, but I keep running into this problem on my installations. The job fails its backoffLimit times before the rest of the pods are ready and has to be manually recreated to run again.

What would be a good way to handle this? I just increase the backoffLimit to something like 15, but that seems like a hack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cloud2Edge Improvements to the Cloud2Edge package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants