New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restarting celery issues and better supervisord config file #102
Comments
Processes spawned when receiving the If you install the ( Setting the timeout to |
As for process spawning: setproctitle and watching for process ids was helpful. It is not process spawning. Worker processes remain alive when parent process is killed.
ps -afx:
I was able reproduce this only with such artifical race between |
I was able to reproduce this with console-started scripts after several attempts:
and then in another terminal session:
|
Haven't found any special option for infinite timeout with warning in supervisord docs. Probably very large number will suffice if it is what we want. |
Maybe it's something related to celerybeat because the I was able to reproduce the issue for console-started celeryd only after using the |
If i am testing some celery tasks locally and i use the -B option sometimes the process is not killed when i used ctrl-c. |
I can't reproduce this locally. Btw, are you running the master branch? I just fixed a bug that could hang shutdown. If you could test with this it would be nice. |
Yes, I'm running the latest master branch. I saw your bug-fixing commit and hoped that it will help but it seems that it doesn't help in my case: the latest celery seems to behave the same. But it is possible that the initial problem is solved - I check this only with an immediate kill. Can't wrap my hand around it now :) The ctrl-c issue is not reproducible with my setup. So the bug report, simplified: http://gist.github.com/401028 . The results are always the same (not sometimes). I have some periodic tasks and some non-periodic. Tasks are simple and don't take much time to finish. Is it a bug that children processes stay alive after killing the main process? If so and you can't reproduce it then I'll try to provide the minimal project. The celerybeat killing behaviour is interesting: when I kill hanging(?) celerybeat process the hanging(?) worker process also shutdowns. |
@kmike I still can't reproduce with the commands above. Maybe because I'm on OS X, or maybe you're running Python 2.5? (I'm running 2.6.1) Could run it with The celerybeat process is started by the main process, so I'm assuming the main process is waiting |
I thought that the main process was killed: it is not visible in process list. Don't have much experience with process management though. My setup was Debian Lenny + python 2.5. I'll try to run celeryd with --loglevel=DEBUG and to reproduce it on my macbook. |
hmm, you're right of course. It's almost like the beat process takes ownership of the pool processes. I just tried to reproduce on Debian Lenny with python 2.5, and it works just right there. |
Ask, thank you for help. I think that initial problem was solved with increased supervisord timeout and your bug-fixing commit. The simulation was incorrect because I use |
Supervisord use TERM signal so all should be fine. |
But the thing that scares me a bit is that the initial bug wasn't investigated. I'll try to reproduce it and let you know. |
Ah! I'm so sorry. I didn't read the issue carefully enough. Yes! That's exactly what happens when you kill it with SIGKILL. The 9 signal can't be catched, so there's nothing we can do about this AFAIK. |
In case you are still having issues terminating your Celery workers, you might try setting |
I use supervisord config file based on the example in celery repository and have some issues when celeryd restart is occured: sometimes task processing silently stops after restarting celery without any error messages in logs. Processes remains visible in process list.
Finally I figured out that sometimes when processes are restarted celery spawns additional process that is not managed by supervisord and this leads to these bugs. So I started to watch
ps
output carefully after each restart and kill extra processes manually via kill . After killing these processes tasks begin to be executed properly. This is sort of hack that solve a problem for a week or so.And today I think the real reason is found. The default supervisord value for 'stopwaitsecs' option is 10s. This means that after 10s celery process will be killed with KILL signal instead of TERM. It seems that celery don't like being killed and try to spawn additional process in that case.
So I think it'll be good to add something like 'stopwaitsecs=600' to all supervisord example config files (from faq: "You should never stop celeryd with the KILL signal (-9), unless you’ve tried TERM a few times and waited a few minutes to let it get a chance to shut down.") and investigate celeryd behaviour on KILL signal: it is mentioned in docs that tasks will be lost (and it is tolerably in many cases) but the issue with spawned process is a bit weird.
The text was updated successfully, but these errors were encountered: