Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gunicorn process fails to stop by Supervisor's stop command and keeps locking ports #520

Closed
xaralis opened this issue Apr 26, 2013 · 18 comments
Milestone

Comments

@xaralis
Copy link

xaralis commented Apr 26, 2013

I am having issue very similar to #291. Application processes (gunicorn based) are managed by supervisor and after fresh deploy is done, supervisored processes are ordered to restart.

In most cases, this works as expected but for one specific application this always fails and prevents application to restart properly. The old gunicorn processes hang up blocking ports for new ones. After few minutes, they finally die but still, this is very inconvenient since it causes unavailability in the app for too long time.

Supervisor config for the app is following:

[program:iw2_admin]
command=/srv/fragaria/iw2/bin/gunicorn --name=gunicorn_iw2_admin --bind=10.0.0.50:13000 --workers=2 --max-requests=5000 --timeout=500 --user=www-data --group=www-data --worker-class sync --worker-connections 1000  iw2.wsgi:application
environment=DJANGO_SETTINGS_MODULE='iw2.admin.settings',LANG='cs_CZ.utf8',LC_ALL='cs_CZ.UTF-8',LC_LANG='cs_CZ.UTF-8'
redirect_stderr=True
stdout_logfile=/var/log/supervisor/iw2_admin.log

Gunicorn's log doesn't show anything interesting but this:

2013-04-26 11:35:16 [30352] [INFO] Starting gunicorn 0.15.0
2013-04-26 11:35:16 [30352] [ERROR] Connection in use: ('10.0.0.50', 13000)
2013-04-26 11:35:16 [30352] [ERROR] Retrying in 1 second.
2013-04-26 11:35:17 [30352] [ERROR] Connection in use: ('10.0.0.50', 13000)
2013-04-26 11:35:17 [30352] [ERROR] Retrying in 1 second.
2013-04-26 11:35:18 [30352] [ERROR] Connection in use: ('10.0.0.50', 13000)
2013-04-26 11:35:18 [30352] [ERROR] Retrying in 1 second.
2013-04-26 11:35:19 [30352] [ERROR] Connection in use: ('10.0.0.50', 13000)
2013-04-26 11:35:19 [30352] [ERROR] Retrying in 1 second.
2013-04-26 11:35:20 [30352] [INFO] Listening at: http://10.0.0.50:13000 (30352)
2013-04-26 11:35:20 [30352] [INFO] Using worker: sync
2013-04-26 11:35:20 [30355] [INFO] Booting worker with pid: 30355
2013-04-26 11:35:20 [30356] [INFO] Booting worker with pid: 30356

The ERRORs are repeated for quite a while as mentioned above...

Env:
Python 2.6.6
Debian squeeze
Gunicorn 0.15.0

I might try to fix it by using fresh version of gunicorn, do you that it might be the solution? Don't wanna risk upgrading if it won't help anyway.

@xaralis
Copy link
Author

xaralis commented Apr 29, 2013

Any progress on this?

@a2
Copy link

a2 commented May 8, 2013

Same issue. Bump!

@benoitc
Copy link
Owner

benoitc commented May 8, 2013

@a2 what is your command line & version ?

@xaralis using the 0.17.x version is always better yes. This is actually the version supported. Anyway what if you add the setting stopsignal = QUIT in your program section?

  • benoit

@a2
Copy link

a2 commented May 8, 2013

@benoitc

  • Ubuntu 12.04.2 LTS
  • gunicorn 0.17.4
  • Python 2.7.3

supervisor config:

[program:gunicorn]
command=/srv/example.com/www/start.sh
process_name=%(program_name)s
directory=/srv/example.com/www
user=web
autostart=true
autorestart=true
redirect_stderr=true
stopsignal=KILL

[program:watchmedo]
command=/usr/local/bin/watchmedo shell-command --patterns "*.py;*.txt;*.scss" --recursive --command='/usr/local/bin/supervisorctl restart gunicorn' /srv/example.com/www
process_name=%(program_name)s
directory=/srv/example.com/www
autostart=true
autorestart=true
redirect_stderr=true

start.sh:

#!/bin/bash
/usr/local/bin/compass compile --boring --trace
source venv/bin/activate
pip install -r requirements.txt
if [ -e "env.sh" ]
then
    source env.sh
fi
gunicorn app:app -c gunicorn.conf.py

I've gathered that the problem is that some of the worker processes are still using port 7200 but aren't killed by supervisor when the process restarts? I really have no idea. I'm sort of a noob but I'm trying to learn quickly.

Thanks so much for your speedy response, Benoit.

@benoitc
Copy link
Owner

benoitc commented May 8, 2013

@a2 cam you replace the line stopsignal=KILL by stopsignal=QUIT in your config and let me know about the results?

@a2
Copy link

a2 commented May 8, 2013

@benoitc If I change that line it makes no difference. If I touch a file monitored by watchmedo, then I get the same errors because the gunicorn process was restarted. I have to stop supervisord, pkill gunicorn, and then start supervisord to stop the errors.

@tilgovi
Copy link
Collaborator

tilgovi commented May 11, 2013

It sounds like you have long lived connections and the high timeout combined with graceful restart is causes workers to exit slowly.

Try TERM or INT instead of QUIT.

@a2
Copy link

a2 commented May 11, 2013

@tilgovi Same result.

@benoitc
Copy link
Owner

benoitc commented May 15, 2013

what do you mean by restarting supervisor? Sending an HUP? in that case I
remember there is a setting to send an hup signal on reload. Or maybe this
is just in gaffer.

Also if you can i would use dystemd that can pass a socket to gunicorn in
latest version.

  • benoit

On Saturday, May 11, 2013, Alexsander Akers wrote:

@tilgovi https://github.com/tilgovi Same result.


Reply to this email directly or view it on GitHubhttps://github.com//issues/520#issuecomment-17752060
.

@benoitc
Copy link
Owner

benoitc commented Mar 9, 2014

signals have been switched in 8124190 . closing this issue, thanks for the feedback!

@benoitc benoitc closed this as completed Mar 9, 2014
@fillest
Copy link
Contributor

fillest commented Jul 7, 2014

@a2 it's because your bash script (via bash shell process) gets supervised, not gunicorn. Try using exec gunicorn ... in your script

@oliversong
Copy link

I know this is an old thread, but for anyone else who has landed here and is using make + gunicorn + supervisor, the above comment is the solution. Supervisor needs the specific gunicorn command to be able to kill the process- providing a make command that runs gunicorn will not kill the process. Something to do with make having its own shell, maybe.

@spout
Copy link

spout commented Sep 8, 2014

@fillest Thanks for the exec trick.

@mylons
Copy link

mylons commented Jul 29, 2016

@tilgovi the long running connections were my issue after running svc -d on my process.

@fillest is also correct that an exec is required

@tilgovi
Copy link
Collaborator

tilgovi commented Jul 29, 2016

I haven't looked at this in a while. If the documentation or examples for supervisord need any changes, please make a PR. Thanks!

@ajtsgwn
Copy link

ajtsgwn commented Sep 30, 2016

using exec is the key
excellent suggestion!

@tilgovi
Copy link
Collaborator

tilgovi commented Oct 1, 2016

@abhijeetsangwan if something needs to change in examples/supervisor.conf please make a pull request.

@akhiljalagam
Copy link

@xaralis

stopasgroup=true
killasgroup=true
stopsignal=INT

If you are using the latest version, this should be effective for your situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants