You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're looking at some of the use cases in which a Cloud Native PG cluster does not recover by itself from issues. We saw that the only way that we can get into such a position is if we delete the PVCs associated with the cluster directly. Currently, we then get in either of two situations:
Deleting the primary PVC and force-killing the pod (no clean shutdown)
This one gets into a reconciliation loop, until I restart the other pods
{"level":"info","ts":"2024-04-29T15:37:16Z","msg":"Current primary isn't healthy, initiating a failover","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"pomerium-sessions","namespace":"pomerium"},"namespace":"pomerium","name":"pomerium-sessions","reconcileID":"6a2a198b-1bfa-43e9-bfde-57dc1baeb83e"}
{"level":"info","ts":"2024-04-29T15:37:16Z","msg":"pod status (1 of 2)","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"pomerium-sessions","namespace":"pomerium"},"namespace":"pomerium","name":"pomerium-sessions","reconcileID":"6a2a198b-1bfa-43e9-bfde-57dc1baeb83e","name":"pomerium-sessions-4","currentLsn":"","receivedLsn":"0/B000060","replayLsn":"0/B000060","isPrimary":false,"isPodReady":true,"pendingRestart":false,"pendingRestartForDecrease":false,"statusCollectionError":null}
{"level":"info","ts":"2024-04-29T15:37:16Z","msg":"pod status (2 of 2)","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"pomerium-sessions","namespace":"pomerium"},"namespace":"pomerium","name":"pomerium-sessions","reconcileID":"6a2a198b-1bfa-43e9-bfde-57dc1baeb83e","name":"pomerium-sessions-5","currentLsn":"","receivedLsn":"0/B000060","replayLsn":"0/B000060","isPrimary":false,"isPodReady":true,"pendingRestart":false,"pendingRestartForDecrease":false,"statusCollectionError":null}
{"level":"info","ts":"2024-04-29T15:37:16Z","msg":"Waiting for all WAL receivers to be down to elect a new primary","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"pomerium-sessions","namespace":"pomerium"},"namespace":"pomerium","name":"pomerium-sessions","reconcileID":"6a2a198b-1bfa-43e9-bfde-57dc1baeb83e"}
Deleting all PVCs and force killing all pods
This actually gets the cluster in a weird state. It is trying to create new replicas (in this case pomerium-sessions-6) referring to a PVC that the Operator has not yet created. It stays in this loop until we recreate the Cluster resource (so basically starting from scratch). How can we get out of this?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi all!
We're looking at some of the use cases in which a Cloud Native PG cluster does not recover by itself from issues. We saw that the only way that we can get into such a position is if we delete the PVCs associated with the cluster directly. Currently, we then get in either of two situations:
Deleting the primary PVC and force-killing the pod (no clean shutdown)
This one gets into a reconciliation loop, until I restart the other pods
One of the replicas
Cloud Native PG Controller
Deleting all PVCs and force killing all pods
This actually gets the cluster in a weird state. It is trying to create new replicas (in this case
pomerium-sessions-6
) referring to a PVC that the Operator has not yet created. It stays in this loop until we recreate theCluster
resource (so basically starting from scratch). How can we get out of this?Beta Was this translation helpful? Give feedback.
All reactions