On server/deploy/oci -> running "helm install example ." to deploy the Inference Server and pod doesn't get to running due to Liveness probe failed & Readiness probe failed #7154

aviv12825 · 2024-04-24T19:33:47Z

On server/deploy/oci - running "helm install example ." to deploy the Inference Server and pod doesn't get to running due to Liveness probe failed & Readiness probe failed.

Below describe log details & I try to add to templates\deployment.yaml file the initialDelaySeconds: 180 which didn't help.
Can someone please advise ?

Events:
Type Reason Age From Message

Normal Scheduled 4m11s default-scheduler Successfully assigned default/example-triton-inference-server-9c5d9f79-74rt4 to 10.0.10.95
Warning Unhealthy 41s (x3 over 61s) kubelet Liveness probe failed: Get "http://10.0.10.177:8000/v2/health/live": dial tcp 10.0.10.177:8000: connect: connection refused
Normal Killing 41s kubelet Container triton-inference-server failed liveness probe, will be restarted
Normal Pulled 11s (x2 over 4m10s) kubelet Container image "nvcr.io/nvidia/tritonserver:24.03-py3" already present on machine
Warning Unhealthy 11s (x13 over 66s) kubelet Readiness probe failed: Get "http://10.0.10.177:8000/v2/health/ready": dial tcp 10.0.10.177:8000: connect: connection refused
Normal Created 10s (x2 over 4m10s) kubelet Created container triton-inference-server
Normal Started 10s (x2 over 4m10s) kubelet Started container triton-inference-server

rmccorm4 · 2024-05-01T00:02:34Z

Hi @aviv12825,

I see the errors returned involve "connection refused". Have you confirmed from the pod logs that the server itself started up successfully to expose these endpoints?

rmccorm4 added the question Further information is requested label May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On server/deploy/oci -> running "helm install example ." to deploy the Inference Server and pod doesn't get to running due to Liveness probe failed & Readiness probe failed #7154

On server/deploy/oci -> running "helm install example ." to deploy the Inference Server and pod doesn't get to running due to Liveness probe failed & Readiness probe failed #7154

aviv12825 commented Apr 24, 2024

rmccorm4 commented May 1, 2024

On server/deploy/oci -> running "helm install example ." to deploy the Inference Server and pod doesn't get to running due to Liveness probe failed & Readiness probe failed #7154

On server/deploy/oci -> running "helm install example ." to deploy the Inference Server and pod doesn't get to running due to Liveness probe failed & Readiness probe failed #7154

Comments

aviv12825 commented Apr 24, 2024

rmccorm4 commented May 1, 2024