New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tens of thousands of backend_cleanup tasks executed every night since 10:42 pm. All async functions stop working #4335
Comments
I just found that the problem is probably about the built-in task, celery.backend_cleanup. I just ran the app in the local development environment. When it was 10:42 pm, tremendous backend_cleanup tasks started and executed successfully. However, after a couple of seconds, an error occurred. Here is the error.
Meanwhile, the queue disappeared from AWS SQS. Then a few of the backend_cleanup tasks continued to execute. And finally, a warning showed up. [2017-10-23 22:51:28,967: WARNING/MainProcess] Restoring 10 unacknowledged message(s) And finally, the HTTPS connection to the SQS queue stopped. The app could not re-connect to SQS unless I re-deployed the app to AWS Elastic Beanstalk. Could anyone help me with this issue? Any response will be significantly appreciated. |
I think I solved the issue. The problem never happens after I changed CELERY_RESULT_BACKEND to "redis" from "django-db", and change CELERY_TIMEZONE to "UTC". Although the Task Results in Django Admin stopped adding any task results, my problem had been majorly solved. |
You can simply overwrite the schedule and avoid using crontab: |
Checklist
celery -A proj report
in the issue.(if you are not able to do this, then at least specify the Celery
version affected).
master
branch of Celery.My app is developed by using django1.11 and celery4.1[sqs]. It is deployed on Amazon AWS through Elastic Beanstalk. AWS SQS is the broker.
Steps to reproduce
The issue shows up every night around 10:42 pm, lasting for about 1 hour. Tens of thousands of backend_cleanup tasks start to be executed. Per the list of the task results in the Admin, each task is executed successfully. The CPU usage sometimes stays at 100%. Every page of the app is hard to open. 1 hour after the issue, the CPU usage is back to normal. However, the queues are wedging. Any delay() tasks cannot be executed. I have to purge the queue in SQS, re-start the app, and re-upload the entire code to AWS Elastic Beanstalk. Then the delay() tasks will be executed normally. If I don't do the purging, re-starting or re-uploading, none of any async tasks will be executed. Even the built-in backendcleaup task cannot start. My app doesn't have any periodic or crontab tasks except the built-in celery.backend_cleanup task. Could anyone help me with this issue?
Expected behavior
I need the entire app to perform normally after the backend_cleanup tasks are finished.
Actual behavior
Described in the ## Steps to reproduce
celery-sqs-worker-report.txt
The text was updated successfully, but these errors were encountered: