Run operator and cluster under istio (maistra) mesh - either webhook or status is unavailable #1280
Replies: 7 comments 2 replies
-
Sorry for the late response, don't know if this help but I understand that the operator cannot be reached from the clusters right? I can see that because you can't reach the webhooks, which it's an important thing to do, and you will require the operator to reach the pods, that's because the operator will try to reach the cluster via networking. With that being said, you can always set the same annotations for the operator deployment, that's also possible, can you check if that works for you? On the other hand, I'm turning this into a discussion since doesn't look like an issue with the operator Best Regards! |
Beta Was this translation helpful? Give feedback.
-
Hi @sxd
|
Beta Was this translation helpful? Give feedback.
-
This it's more an Istio thing that it's about creating the mesh network making pod capable to talk each other between namespaces and like that, sadly we're not Istio experts at that level. I can tell that it's a network issue because I've seen this with NetworkPolicies and other users fixed this, but they didn't document how they did it, if you find a solution please let us know and we can add this fix to the Q&A Best Regards! |
Beta Was this translation helpful? Give feedback.
-
I'm running into this issue as well when bootstrapping with initdb. When I have come across this in the past it was because the pod was trying to communicate with the operator before the proxy sidecar was ready. This would cause the "failed calling webhook" error that @alishchytovych reported. To get around this we would set the Providing support for setting the annotations/labels for the bootstrap pods would provide the flexibility needed to get around issues like this. |
Beta Was this translation helpful? Give feedback.
-
Hi! While working on service configuration and enabling Istio with CNPG I realized that I have very similar issue as mentioned. My setup:
From what I found in cloudnative-pg/cloudnative-pg code the requests made to pods are made with podIP which is problematic while using Istio because the Envoy is controing the pod ingress, and It's trying to enforce the mTLS connections (I'm using STRICT policy in current case) After logs enablement in Istio I have find that Envoy is rejecting the connection to the PG pod with message (connection from curl pod made from app namespace): ~ $ curl http://100.96.6.63:8000/pg/status -v
* processing: http://100.96.6.63:8000/pg/status
* Trying 100.96.6.63:8000...
* Connected to 100.96.6.63 (100.96.6.63) port 8000
> GET /pg/status HTTP/1.1
> Host: 100.96.6.63:8000
> User-Agent: curl/8.2.0
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< content-length: 95
< content-type: text/plain
< date: Mon, 24 Jul 2023 14:42:11 GMT
< server: envoy
<
* Connection #0 to host 100.96.6.63 left intact
upstream connect error or disconnect/reset before headers. reset reason: connection termination~
~ $ And in logs the the message suggest that it's a problem to even create the connection to 8000: {
"authority": "100.96.6.63:8000",
"requested_server_name": null,
"connection_termination_details": null,
"upstream_host": "100.96.6.63:8000",
"upstream_cluster": "PassthroughCluster",
"x_forwarded_for": null,
"upstream_local_address": "100.96.5.131:52770",
"protocol": "HTTP/1.1",
"user_agent": "curl/8.2.0",
"upstream_service_time": null,
"upstream_transport_failure_reason": null,
"route_name": "allow_any",
"bytes_received": 0,
"duration": 1,
"downstream_local_address": "100.96.6.63:8000",
"bytes_sent": 95,
"path": "/pg/status",
"response_flags": "UC",
"start_time": "2023-07-24T14:42:11.344Z",
"response_code_details": "upstream_reset_before_response_started{connection_termination}",
"method": "GET",
"response_code": 503,
"downstream_remote_address": "100.96.5.131:52762",
"request_id": "1be2486c-89f8-4d9b-a1ad-d0387d850ea6",
} So the first issue for me while using Istio is IMO to use the PodIP in the first place (which is a case in more than one place): IMO to make CNPG possible to work in the situation as mentioned, the operator needs to change this or support the connections made with some sort of services (like headless service) pointed to single pods. As service meshes are focused mostly on services and inter-service networking, they are staring to struggle when the connections are going to the workloads without any sort of service in between. So after setting up headless service like this: apiVersion: v1
kind: Service
metadata:
name: pg-cluster-1-hs
spec:
selector:
cnpg.io/instanceName: pg-cluster-1 #taken from running pod description
clusterIP: None
ports:
- name: status
port: 8000
targetPort: 8000
- name: postgres
port: 5432
targetPort: 5432 I could reach the /pg/status endpoint from inside the app namespace and from cnpg-system as well PS C:\Users\krzyzt\GIT\helm\deployments\app> kubectl run -i --namespace cnpg-system --tty curl --image=curlimages/curl:latest -- sh
If you don't see a command prompt, try pressing enter.
~ $ curl pg-cluster-1-hs:8000/pg/status
curl: (6) Could not resolve host: pg-cluster-1-hs
~ $ curl pg-cluster-1-hs.app.svc.cluster.local:8000/pg/status
{"currentLsn":"0/AB000000","systemID":"7258248197917290525","isPrimary":true,"replayPaused":false,"pendingRestart":false,"pendingRestartForDecrease":false,"isWalReceiverActive":false,"node":"","pod":{"metadata":{"name":"pg-cluster-1","creationTimestamp":null},"spec":{"containers":null},"status":{}},"isPgRewindRunning":false,"totalInstanceSize":"32 MB","mightBeUnavailable":false,"lastArchivedWAL":"0000000100000000000000AA","lastArchivedWALTime":"2023-07-24T15:06:33.582965Z","lastFailedWALTime":"-infinity","isArchivingWAL":true,"currentWAL":"0000000100000000000000AA","timeLineID":1,"isPodReady":false,"executableHash":"76ceae159df15258568e3c48687e90c35aecda372e23e4ab1ef2a7eb6a2fe950","isInstanceManagerUpgrading":false,"instanceManagerVersion":"1.20.1","instanceArch":"amd64"}
~ $ Best regards! FYI @sxd Edit: Headless service is actually making able operator to manage the cluster nodes without any extra op code changes - with headless svc, the connections to PodIPs are working as headless svc is changing the routing on proxies and eventually accepting the connections. So to summarize:
|
Beta Was this translation helpful? Give feedback.
-
You may also need to use e.g.
|
Beta Was this translation helpful? Give feedback.
-
There are two requirements for CNPG operator to run on the mesh:
|
Beta Was this translation helpful? Give feedback.
-
Trying to run cnpg operator (1.18.1) and cluster under istio (maistra 2.2) service mesh and have a major issue.
if cnpg-system is a member of maistra (has maistra.io/member-of: istio-system) - no cluster can be created with the error "failed calling webhook "vpooler.kb.io": failed to call webhook: Post "https://cnpg-webhook-service.cnpg-system.svc:443/validate-postgresql-cnpg-io-v1-pooler?timeout=10s": dial tcp 10.130.3.18:9443: i/o timeout"
if cnpg-system is not a member of service mesh - operator can't get cluster status with the errors like this "msg":"Cannot extract Pod status","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"pg","namespace":"workload"},"namespace":"workload","name":"pg","reconcileID":"ba85f1e4-b2c2-452e-a324-3472e7d33a9e","uuid":"e6e96622-84a6-11ed-96f7-0a580a8202ff","name":"pg-1","error":"Get "http://10.128.2.214:8000/pg/status\": dial tcp 10.128.2.214:8000: i/o timeout"}"
In the cluster definition the server instances and pooler have annotations, these annotations are properly inhereted by the corresponding pods:
annotations:
sidecar.istio.io/inject: 'true'
proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'
I've tried to use traffic.sidecar.istio.io/excludeInboundPorts: "8000" - it doesn't help.
I've tried to create serviceentry, virtualservice and destinationrule for webhook - it doesn't help.
Any ideas how to make it fully working?
Beta Was this translation helpful? Give feedback.
All reactions