-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
supervisord crash when stop a subprocess #536
Comments
👍 |
Related: #445 |
This crash is caused by Subprocess.finish() not having any logic to deal with ProcessStates.UNKNOWN, which is only brought up in a few situations (e.g. calling "supervisorctl stop" to terminate a flapping process, only for the PID to become invalid moments before running options.kill, which bombs with an exception that is caught and changes the state to UNKNOWN), and so ends up crashing the daemon here. You can simulate the race condition by dropping in a "raise Exception()" after options.kill(pid, sig). We opted to fix this for our application servers by changing the state to FATAL (so that we weren't barred from fixing the process), and adding in a self.pid sanity check for the final else branch in finish(). |
We've seen this issue as well, and I would like some insight into the root cause, which is that supervisor thinks the process is already terminated at the point where |
when stop a subprocess of supervisord, the stopping operation overtake the time of stopwaitsecs and using 'options.kill' to stop the subprocess.
2014-11-20 16:42:45,352 INFO success: resumed process 'hbase--dptst-example--regionserver' with pid 46651
2014-12-09 02:56:47,389 WARN killing 'hbase--dptst-example--regionserver' (46651) with SIGKILL
2014-12-09 02:56:47,422 CRIT unknown problem killing hbase--dptst-example--regionserver (46651):Traceback (most recent call last):
File "/home/work/app/supervisor/supervisor/process.py", line 390, in kill
options.kill(pid, sig)
File "/home/work/app/supervisor/supervisor/options.py", line 1219, in kill
os.kill(pid, signal)
OSError: [Errno 3] No such process
However, this exception crash supervisord, can we ignore and skip this exception rather than crashing the supervisord ?
any idea to share ? thanks
The text was updated successfully, but these errors were encountered: