Pod stuck in ContainerCreating after too many range is full errors #4218

freedge · 2024-03-12T20:17:04Z

if a pod fails multiple times to get an IP due to a "err: range is full" error, the pod will stay stuck in ContainerCreating forever

as implemented through
906a598

the retry is done for around 15 minutes up until the final attempt, then the pod hangs there until ovnkube-controller is restarted or the pod deleted.
As the pod status says "Creating" I believe ovnk should keep trying.

The text was updated successfully, but these errors were encountered:

tssurya · 2024-03-12T20:22:25Z

@freedge : thanks for the issue!
So I agree in ideal k8s world we probably should keep retrying but in ovnkube today we keep retries pinned at max 15 because the retry is triggered every 30seconds and there is a backoff algorithm as well so totally it amounts to many minutes of retry which can flood large environments which is the reason why we added a cap and did that fix to supress those logs instead of infinite retry cc @ricky-rav PTAL

So in this case if the range is indeed really full and we can't do anything about it, I think admin should react on the triggered "subnet full alert" and do the needful which would retrigger events.

However we can revisit this cap and explore real level drivenness if this needs fine tuning.

ricky-rav · 2024-03-13T08:39:49Z

Even in level-driven controllers there's usually a cap on the number of retries (re-queues). We can revisit our max of 15 if there's a specific need, but I think it's a lot more complicated to handle if we /never/ give up an add/update/delete operation than if we give up after n failed attempts

tssurya added services/endpoints All issues related to the Servces/Endpoints API pods All issues related to the PodAPI kind/support Ask a question or get support for anything in ovn-kubernetes and removed services/endpoints All issues related to the Servces/Endpoints API labels Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod stuck in ContainerCreating after too many range is full errors #4218

Pod stuck in ContainerCreating after too many range is full errors #4218

freedge commented Mar 12, 2024

tssurya commented Mar 12, 2024 •

edited

ricky-rav commented Mar 13, 2024

Pod stuck in ContainerCreating after too many range is full errors #4218

Pod stuck in ContainerCreating after too many range is full errors #4218

Comments

freedge commented Mar 12, 2024

tssurya commented Mar 12, 2024 • edited

ricky-rav commented Mar 13, 2024

tssurya commented Mar 12, 2024 •

edited