New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gunicorn CRITICAL WORKER TIMEOUT #1440
Comments
when a worker timeout it means that it didn't notified the arbiter in time that it was alive. Do you have any task executed during a request that could take longer than the timeout? |
@jseaidou bump. |
Sorry for the late reply @benoitc. The issue I'm seeing is actually the non active threads doing this. My active threads don't timeout, my the non active ones under less load give more critical timeout errors than just gracefully timing out. I switched from gevent to tornado and that seems to have fixed the hang issue but I'm still seeing 3 workers consistently giving the critical timeout every 30 seconds. If it's a normal timeout it shouldnt be a critical error. |
I am facing exactly the same issue. |
@jseaidou it's anormal timeout in the sense the arbiter react to it. It's critical because it should normally not happen. Most probably one of your worker is doing a blocking operation preventing the gunicorn worker to notify the arbiter. If you have a long operation, make sure to trigger the geven scheduler from time to time by sleeping or such. or anything that call the tornado scheduler back too. How can I reproduce the issue? @saabeilin same ^^ |
I'm seeing the same thing: workers are timing out even when serving no requests. All I've done is launched my container on AWS ECS.
This does not occur when running locally. :-/ |
Looks like switching to |
Duplicate of #1194, I think. |
I’ve seen this happen repeatedly lately, and for me it seems connected with putting the laptop to sleep. When I open the lid, a bunch of these messages show. Not sure this helps, but I thought I’d mention it… |
gunicorn --daemon --workers 2 --timeout 120 --bind 127.0.0.1:4000 --pid /var/run/mshc_admin.pid --user danilovskoe --group danilovskoe --chdir /home/danilovskoe/mshc2/src/flask/ --env MSHC_PRODUCTION=/etc/monit/mshc.config.py admin_gunicorn:app timeout 30 seconds ubuntu 16.04 |
I met with the trouble. Just after starting the app, it works. But only if there is a request,
When I switched the worker class to Notice: My app runs on the physical host, neither the virtual host nor the cloud host. Update: So I guess that it's the question of gevent or gevent worker. |
There are reports on this issue of gevent solving the problem and gevent causing the problem. I cannot identify a root cause here. Some of the reports may be the same as #1194, but others maybe not. If anyone can share a minimal case to reproduce, that would help. |
I'm not sure it's definitely the same issue, but I can reproduce this 100% of the time using Virtualbox with the following setup: Host: Windows 10 I forward TCP:8000 between host and guest over the default NAT connection. Using a I appreciate Virtualbox is a completely different dimension, but it does sound very similar to what is being described above, and is consistently reproducible (for me at least). |
I am seeing this happen with slow uploads. During an upload (to a Django site) if the worker timeout is hit, the upload dies. |
@lordmauve if you are using the sync worker that is expected. Long requests will block the worker and eventually the arbiter will kill it. You can use a different worker type if you expect long requests to succeed. |
For anyone reading this thread, please re-open with a minimal case to reproduce. I cannot see any clean investigation to pursue here. For the case of AWS / ECS, I'm still leaving #1194 open until I can test the configurations I listed (#1194 (comment)). |
Anyone who still have this problem, please check the resource availability for the application in addition of increasing the timeout and changing the worker class type I was having this problem when I try to deploy my application using Docker Swarm and realized I was limiting the resource too low for the application. Increasing the resource solve the problem for me
I think this is not a bug, just how the way we configure our apps |
Thanks, this makes sense. |
@fred-revel Were you able to resolve the issue? I think I have exactly the same problem. |
I am running a gunicorn with the following settings:
gunicorn --worker-class gevent --timeout 30 --graceful-timeout 20 --max-requests-jitter 2000 --max-requests 1500 -w 50 --log-level DEBUG --capture-output --bind 0.0.0.0:5000 run:app
and I am seeing in all but 3 workers the[CRITICAL] WORKER TIMEOUT
. After a certain while, gunicorn can't spawn anymore workers or at least is very slow at spawning them. This causes it to not make the server reachable and any request unreachable.I reduced the number of workers to 3 and gave each worker 2 threads and now Im not seeing this issue anymore.
I can't get a stacktrace from the timeouts but this seems like after a certain number of workers it can't handle them?
The text was updated successfully, but these errors were encountered: