Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receiving "Deployment is not ready" error while the deployment is ready actually #264

Open
MurzNN opened this issue Dec 1, 2023 · 8 comments

Comments

@MurzNN
Copy link

MurzNN commented Dec 1, 2023

Describe the bug
When I start the pv-migrate, it creates the deployment, but in the debug log I see errors like:

🚁 Attempting strategy: lbsvc
🔑 Generating SSH key pair
creating 4 resource(s)
beginning wait for 4 resources with timeout of 1m0s
Deployment is not ready: korepov/pv-migrate-dbabc-src-sshd. 0 out of 1 expected pods are ready
Deployment is not ready: korepov/pv-migrate-dbabc-src-sshd. 0 out of 1 expected pods are ready
Deployment is not ready: korepov/pv-migrate-dbabc-src-sshd. 0 out of 1 expected pods are ready
Deployment is not ready: korepov/pv-migrate-dbabc-src-sshd. 0 out of 1 expected pods are ready

But at the same time, via kubectl I see that the deployment is ready:

$ kubectl -n korepov get deployment pv-migrate-dbabc-src-sshd 
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
pv-migrate-dbabc-src-sshd   1/1     1            1           43s

The log level is debug, and no additional messages were displayed.

So, any ideas on what can cause this problem?

How can I enable more verbose logging to understand what's happening and why it is not detecting the ready status?

Console output

🚁 Attempting strategy: lbsvc
🔑 Generating SSH key pair
creating 4 resource(s)
beginning wait for 4 resources with timeout of 1m0s
Deployment is not ready: korepov/pv-migrate-dbabc-src-sshd. 0 out of 1 expected pods are ready
Deployment is not ready: korepov/pv-migrate-dbabc-src-sshd. 0 out of 1 expected pods are ready
Deployment is not ready: korepov/pv-migrate-dbabc-src-sshd. 0 out of 1 expected pods are ready
Deployment is not ready: korepov/pv-migrate-dbabc-src-sshd. 0 out of 1 expected pods are ready
🧹 Cleaning up
uninstall: Deleting pv-migrate-dbabc-src
uninstall: given cascade value: , defaulting to delete propagation background
Starting delete for "pv-migrate-dbabc-src-sshd" Service
Starting delete for "pv-migrate-dbabc-src-sshd" Deployment
Starting delete for "pv-migrate-dbabc-src-sshd" Secret
Starting delete for "pv-migrate-dbabc-src-sshd" ServiceAccount
beginning wait for 4 resources to be deleted with timeout of 1m0s
purge requested for pv-migrate-dbabc-src
✨ Cleanup done
🔶 Migration failed with this strategy, will try with the remaining strategies
Error: migration failed: all strategies failed for this migration

**Version**
 - Source and destination Kubernetes versions: source - `v1.25.6`, destination - ` v1.27.7`
 - Source and destination container runtimes: source - `containerd://1.6.15`, destination - `containerd://1.7.5`
 - pv-migrate version 1.7.1 (commit: 1affa11b175d20969b9d6f2879c09dc94f0b4a0f) (build date: 2023-10-09T21:56:55Z)
 - Installation method: krew
 - Source and destination PVC type, size and accessModes: `ReadWriteMany, csi-cephfs-sc, 2G -> ReadWriteMany, 
local-path, 2G` 
@MurzNN
Copy link
Author

MurzNN commented Dec 1, 2023

And here is the output of the all resources, related to the process, while I see the "Deployment is not ready" error:

$ kubectl -n korepov get all | grep pv-migrate
pod/pv-migrate-dbddb-src-sshd-cf79c787-d2nph   1/1     Running   0               18s
service/pv-migrate-dbddb-src-sshd    NodePort    10.233.18.8     <none>        22:32148/TCP                 20s
deployment.apps/pv-migrate-dbddb-src-sshd   1/1     1            1           19s
replicaset.apps/pv-migrate-dbddb-src-sshd-cf79c787   1         1         1       19s

@utkuozdemir
Copy link
Owner

This looks like a bug, I'll have a look. You can get more info by --log-level=debug --log-format=json, but not sure if it's gonna help here.

@MurzNN
Copy link
Author

MurzNN commented Dec 1, 2023

Thanks! I already have --log-level=debug and --log-format=json just adds more garbage to the output, but not new useful information ;)
Maybe you can explain how to debug this on my side? And I will share more debugging information for you.

@utkuozdemir
Copy link
Owner

I had a look and noticed that this error comes from Helm's wait logic, not from our code. So I would give a try to pass --skip-cleanup and try to troubleshoot it using helm cli, trying to find out why it does not report as ready.
You can give a try to

helm ls -a
helm status <name-of-the-release>

Also, note that for lbsvc, Helm would wait for the created Service to actually get an external IP (not pending). This could be the problem.

@MurzNN
Copy link
Author

MurzNN commented Dec 13, 2023

Tested, even without --skip-cleanup - it shows as deployed, while in the terminal I see coming lines:

Deployment is not ready: korepov/pv-migrate-dcada-src-sshd. 0 out of 1 expected pods are ready

Here is the output of helm:

$ helm status pv-migrate-dcada-src
NAME: pv-migrate-dcada-src
LAST DEPLOYED: Wed Dec 13 15:12:42 2023
NAMESPACE: korepov
STATUS: deployed
REVISION: 1
TEST SUITE: None

@MurzNN
Copy link
Author

MurzNN commented Dec 13, 2023

Seems this problem is related to the NodePort service type mode. I can't test it with LoadBalancer type because no free IPs are available for it on the source cluster.

But I tested on the destination cluster (just test the copy back), and with LoadBalancer it works well, but with NodePort I'm receiving the same error.

While the pv-migrate waits for readiness, I see the Service in the active state, here are the details:

$ kubectl describe service pv-migrate-bdaea-src-sshd
Name:                     pv-migrate-bdaea-src-sshd
Namespace:                korepov-pro-dev
Labels:                   app.kubernetes.io/component=sshd
                          app.kubernetes.io/instance=pv-migrate-bdaea-src
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=pv-migrate
                          app.kubernetes.io/version=0.5.0
                          helm.sh/chart=pv-migrate-0.5.0
Annotations:              meta.helm.sh/release-name: pv-migrate-bdaea-src
                          meta.helm.sh/release-namespace: korepov-pro-dev
Selector:                 app.kubernetes.io/component=sshd,app.kubernetes.io/instance=pv-migrate-bdaea-src,app.kubernetes.io/name=pv-migrate
Type:                     NodePort
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.233.53.90
IPs:                      10.233.53.90
Port:                     ssh  22/TCP
TargetPort:               22/TCP
NodePort:                 ssh  31784/TCP
Endpoints:                10.233.74.26:22
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

And I can connect to this node port on the source cluster from the destination cluster (using the externtal IP of any node) via telnet:

# telnet 1.2.3.4 31784
Trying 1.2.3.4...
Connected to 1.2.3.4.
Escape character is '^]'.
SSH-2.0-OpenSSH_9.3

So, the network connection is not a problem.

So, could you please describe what exactly it tries to wait? And maybe make the more verbose debug logging to cath it?

@MurzNN
Copy link
Author

MurzNN commented Dec 13, 2023

Also, specifying the source node IP address explicitly using --dest-host-override 1.2.3.4 doesn't help too.

@MurzNN
Copy link
Author

MurzNN commented Dec 14, 2023

And will be good to add to the debug logs the output of the Helm chart deployment status, at least helm status, but better - also the pod and service status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants