Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On server/deploy/oci -> running "helm install example ." to deploy the Inference Server and pod doesn't get to running due to Liveness probe failed & Readiness probe failed #7154

Open
aviv12825 opened this issue Apr 24, 2024 · 1 comment
Labels
question Further information is requested

Comments

@aviv12825
Copy link

On server/deploy/oci - running "helm install example ." to deploy the Inference Server and pod doesn't get to running due to Liveness probe failed & Readiness probe failed.

Below describe log details & I try to add to templates\deployment.yaml file the initialDelaySeconds: 180 which didn't help.
Can someone please advise ?

Events:
Type Reason Age From Message


Normal Scheduled 4m11s default-scheduler Successfully assigned default/example-triton-inference-server-9c5d9f79-74rt4 to 10.0.10.95
Warning Unhealthy 41s (x3 over 61s) kubelet Liveness probe failed: Get "http://10.0.10.177:8000/v2/health/live": dial tcp 10.0.10.177:8000: connect: connection refused
Normal Killing 41s kubelet Container triton-inference-server failed liveness probe, will be restarted
Normal Pulled 11s (x2 over 4m10s) kubelet Container image "nvcr.io/nvidia/tritonserver:24.03-py3" already present on machine
Warning Unhealthy 11s (x13 over 66s) kubelet Readiness probe failed: Get "http://10.0.10.177:8000/v2/health/ready": dial tcp 10.0.10.177:8000: connect: connection refused
Normal Created 10s (x2 over 4m10s) kubelet Created container triton-inference-server
Normal Started 10s (x2 over 4m10s) kubelet Started container triton-inference-server

@rmccorm4
Copy link
Collaborator

rmccorm4 commented May 1, 2024

Hi @aviv12825,

I see the errors returned involve "connection refused". Have you confirmed from the pod logs that the server itself started up successfully to expose these endpoints?

@rmccorm4 rmccorm4 added the question Further information is requested label May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Development

No branches or pull requests

2 participants