Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scriptmgr-server failing readiness and liveness probes #1846

Open
hammadahmed1985 opened this issue Feb 22, 2024 · 1 comment
Open

scriptmgr-server failing readiness and liveness probes #1846

hammadahmed1985 opened this issue Feb 22, 2024 · 1 comment

Comments

@hammadahmed1985
Copy link

Describe the bug
I am trying to run a self-hosted pixie in a 3 node cluster. Here's how my env looks like:
Kubernetes Version: v1.28.2
OS-Image: Rocky Linux 8.9 (Green Obsidian)
Kernel Version: 5.4.266-1.el8.elrepo.x86_64
Container Runtime: containerd://1.6.26
Pixie Cloud Version: 0.1.7

$ kubectl get nodes -o wide
NAME                     STATUS   ROLES           AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                           KERNEL-VERSION                CONTAINER-RUNTIME
rdev5-rocky8-control-1   Ready    control-plane   43d   v1.28.2   10.76.110.148   <none>        Rocky Linux 8.9 (Green Obsidian)   5.4.266-1.el8.elrepo.x86_64   containerd://1.6.26
rdev5-rocky8-worker-1    Ready    <none>          43d   v1.28.2   10.76.110.140   <none>        Rocky Linux 8.9 (Green Obsidian)   5.4.266-1.el8.elrepo.x86_64   containerd://1.6.26
rdev5-rocky8-worker-2    Ready    <none>          43d   v1.28.2   10.76.110.136   <none>        Rocky Linux 8.9 (Green Obsidian)   5.4.266-1.el8.elrepo.x86_64   containerd://1.6.26

To Reproduce
Steps to reproduce the behavior:
https://docs.px.dev/installing-pixie/install-guides/self-hosted-pixie/#1.-deploy-pixie-cloud

Expected behavior
Pixie self-hosted cloud gets deployed.

Logs
Please attach the logs by running the following command:

$ kubectl -n plc describe pod/scriptmgr-server-56d97c78c7-q6s4m
Events:
  Type     Reason          Age                 From               Message
  ----     ------          ----                ----               -------
  Normal   Scheduled       18m                 default-scheduler  Successfully assigned plc/scriptmgr-server-56d97c78c7-q6s4m to rdev5-rocky8-worker-2
  Normal   AddedInterface  16m                 multus             Add eth0 [192.168.84.196/32] from k8s-pod-network
  Normal   Created         15m (x2 over 16m)   kubelet            Created container scriptmgr-server
  Normal   Started         15m (x2 over 16m)   kubelet            Started container scriptmgr-server
  Normal   Killing         15m                 kubelet            Container scriptmgr-server failed liveness probe, will be restarted
  Warning  Unhealthy       15m (x12 over 16m)  kubelet            Readiness probe failed: Get "https://192.168.84.196:52000/healthz": dial tcp 192.168.84.196:52000: connect: connection refused
  Warning  Unhealthy       15m (x6 over 16m)   kubelet            Liveness probe failed: Get "https://192.168.84.196:52000/healthz": dial tcp 192.168.84.196:52000: connect: connection refused
  Normal   Pulled          11m (x7 over 16m)   kubelet            Container image "gcr.io/pixie-oss/pixie-prod/cloud/scriptmgr_server_image:0.1.7" already present on machine

@gofrolist
Copy link

I've faced the same issue described here #1838

I got these errors in logs scriptmgr-server

time="2024-02-14T00:29:10Z" level=error msg="Failed to update store using bundle.json from gcs." bucket=pixie-prod-artifacts error="rpc error: code = Internal desc = failed to download bundle.json" path=script-bundles/bundle-oss.json
time="2024-02-14T00:30:40Z" level=error msg="Failed to get attrs of bundle.json" bucket=pixie-prod-artifacts error="Get \"https://storage.googleapis.com/storage/v1/b/pixie-prod-artifacts/o/script-bundles%2Fbundle-oss.json?alt=json&prettyPrint=false&projection=full\": dial tcp 142.250.176.27:443: i/o timeout" path=script-bundles/bundle-oss.json

workaround for me is to add failureThreshold and failureThreshold equal to 5

---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- yamls/cloud_deps_elastic_operator.yaml
- yamls/cloud_deps.yaml
- yamls/cloud.yaml
- yamls/cloud_ingress_grpcs.yaml
- yamls/cloud_ingress_https.yaml

patches:
- target:
    group: apps
    version: v1
    kind: Deployment
    name: scriptmgr-server
  patch: |-
    - op: add
      path: /spec/template/spec/containers/0/livenessProbe/failureThreshold
      value: 5
    - op: add
      path: /spec/template/spec/containers/0/readinessProbe/failureThreshold
      value: 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants