Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gunicorn worker sometimes got stuck when manage by supervisor #1629

Open
WilliamChen-luckbob opened this issue Mar 11, 2024 · 0 comments
Open
Labels

Comments

@WilliamChen-luckbob
Copy link

WilliamChen-luckbob commented Mar 11, 2024

I have a flask app which I want to deploy by gunicorn and manage it by supervisor.

while doing restart in supervisorctl, sometimes it will be workers stuck and dead and cannot shutdown by supervisor when stop or restart.

You can see as below:
There should be 1master and 5 workers. I keep trying to restart app and check the log situation and the number of processes.
Sometimes the worker will stuck and not starting at all, but it exists, and it's parent pid is the master. The master shutting down doesn't close this stucked worker process, its parent process becomes 1 and stays stuck there forever.

Showing as below, during one startup, process 25896 started successfully with its parent process being 25879 but with no initializing log. However, when Supervisor restarts, a normal service starts successfully, and process 25896's parent process becomes 1.

This issue is not reproducible consistently.

Once a stuck worker appears, it affects the operation of the master. For example, when a request enters the distribution phase if I don't kill -9 those dead pids , actually, there will be 6+ wokers(5 normal and 1+ dead ), the master loadbalancer will sends data to dead process and never receives a response, which will lead to a timeout request.

This issue never occurs when manually executing gunicorn myapp.wsgi:app -c myapp_gunicorn_conf.py. I have tried extensively to verify if it's an issue with my code and found that when starting gunicorn directly from the command line (regardless of whether using nohup for background startup), the program always starts and stops correctly.

image
image

supervisor version 4.2.5
python 3.11.7
gunicorn==20.1.0
gevent==22.10.2

I don't know why I run into this situation...Can anyone tell me how to inspect the detailed log to see what happend? The dead pid will show nothing in log, just stuck and do nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants