Replies: 1 comment
-
Tracked down the memory problem in my inference that I guess was heightened by the probes! So I'm closing this particular discussion. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I have a strange issue that I cannot get to the bottom of that I wondered if anyone could give any pointers.
Our model was happily being served but every now and then we are getting 502 Bad Gateways, especially under load.
This model is just a custom predictor that runs a PyTorch model segmentation model. Normal RAM usage is around 600 MB and inference takes about a second. It also has a resizing pre-processing step. It is using the v2 protocol.
In investigating this (which I'm still not sure of the answer) I realised our container doesn't contain any readiness or liveness checks. As my hypothesis was that jobs were being sent to the container by the queue proxy before the model was ready, as the model needs to be downloaded from a cloud provider first.
So I added checks in the form of:
Which also involved making sure
jq
was installed in the container image.Now adding these checks is causing the memory of the pod to slowly go up and up, or at least is spikes and at some point doesn't return to a normal level without the probe checks. So eventually the spike is too big for the memory limit of the pod and it is killed with an OOM.
If I remove the probe checks the pod runs ok.
So I wanted to ask am I doing something silly here? Is this the best way to define the readiness and liveness checks? Has anyone else ever experienced something like this?
Beta Was this translation helpful? Give feedback.
All reactions