Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supervisord crash when stop a subprocess #536

Closed
YxAc opened this issue Dec 9, 2014 · 4 comments
Closed

supervisord crash when stop a subprocess #536

YxAc opened this issue Dec 9, 2014 · 4 comments
Labels

Comments

@YxAc
Copy link

YxAc commented Dec 9, 2014

when stop a subprocess of supervisord, the stopping operation overtake the time of stopwaitsecs and using 'options.kill' to stop the subprocess.

2014-11-20 16:42:45,352 INFO success: resumed process 'hbase--dptst-example--regionserver' with pid 46651
2014-12-09 02:56:47,389 WARN killing 'hbase--dptst-example--regionserver' (46651) with SIGKILL
2014-12-09 02:56:47,422 CRIT unknown problem killing hbase--dptst-example--regionserver (46651):Traceback (most recent call last):
File "/home/work/app/supervisor/supervisor/process.py", line 390, in kill
options.kill(pid, sig)
File "/home/work/app/supervisor/supervisor/options.py", line 1219, in kill
os.kill(pid, signal)
OSError: [Errno 3] No such process

However, this exception crash supervisord, can we ignore and skip this exception rather than crashing the supervisord ?

any idea to share ? thanks

@awroblewska
Copy link

👍

@mnaberez
Copy link
Member

Related: #445

@scottp-dpaw
Copy link

This crash is caused by Subprocess.finish() not having any logic to deal with ProcessStates.UNKNOWN, which is only brought up in a few situations (e.g. calling "supervisorctl stop" to terminate a flapping process, only for the PID to become invalid moments before running options.kill, which bombs with an exception that is caught and changes the state to UNKNOWN), and so ends up crashing the daemon here. You can simulate the race condition by dropping in a "raise Exception()" after options.kill(pid, sig).

We opted to fix this for our application servers by changing the state to FATAL (so that we weren't barred from fixing the process), and adding in a self.pid sanity check for the final else branch in finish().

@n2taylor
Copy link

We've seen this issue as well, and I would like some insight into the root cause, which is that supervisor thinks the process is already terminated at the point where supervisorctl stop sends the kill signal, leading to the OSError: [Errno 3] No such process exception (which as noted above, supervisor fails to handle correctly). But every time we've seen this issue, the process we're trying to kill is in fact still running, and the PID is still there in the list of processes (ps). Some kind of permissions issue maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

5 participants