Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend_Cleanup Delete fest #5401

Open
AvnerCohen opened this issue Mar 20, 2019 · 5 comments
Open

Backend_Cleanup Delete fest #5401

AvnerCohen opened this issue Mar 20, 2019 · 5 comments

Comments

@AvnerCohen
Copy link

AvnerCohen commented Mar 20, 2019

Brief Summary

We have a rather large production setup, with pretty large messages stored on a mongo as the backend store.
As part of the "Backend_cleanup" feature we are seeing huge load on our backend, every day at 4:00UTC, because the exact same query is executed by all workers

This was reported in the past at #4335

Proposed Behavior

To resolve this what we have done on our end, is disabled the celery_result_expires on all workers, and enabled it just in one of the workers.
The load was fixed as a result.

What we see on the mongodb is the exact same query being called from hundreds of workers, this results in an unneeded load given that this is a pretty straightforward delete query and the lock/contention on the DB is really unnecessary.

I think the following are options to fix or improve that situation, as I am sure some people see this daily load on their production system without fully knowing what is causing it.

Options to fix:

  1. When using mongo as a backend, remove entries with a mongo TTL index (https://docs.mongodb.com/manual/core/index-ttl/)
    The obvious down side is that this is mongo specific (although redis can do the same).
  2. Instead of running the cleanup on 4:00AM sharp, randomize this across 30 minutes, this will make it such that atleast when some of the workers comes to perform the delete, it will already be done by than and the contention will be limited.
    Down side is it's a bit of black magic.
  3. Perform a logical LOCK inside the DB such that before performing the deletion some bit will be checked, if it's ON, it means someone is already deleting the data.
    The bit can be on a date basis, such that even if something happend and it was left "ON" it will not block deletion in future days.

Happy to provide more context if anything is missing.

@thedrow
Copy link
Member

thedrow commented Mar 26, 2019

MongoDB should definitely use TTLs.
I think that deserves an issue of it's own.
Would you like to provide a PR that implements such a behavior?

I agree that the cleanup task execution time is arbitrary and should at the very least be configurable.
However, as far as I can see the only one running it is celery beat.
I'm not sure how it is possible that it's being executed by multiple workers.

@georgepsarakis
Copy link
Contributor

As @thedrow mentioned, only certain backends require cleanup tasks, which are enqueued by Celery Beat.

@AvnerCohen a few questions on your architecture:

  • are you perhaps running multiple Celery Beat instances?
  • what broker are you using? If you are using Redis you may have to adjust the visibility timeout setting.

@AvnerCohen
Copy link
Author

@thedrow @georgepsarakis Indeed, we have tens of Beats.
We are using RabbitMQ as a broker and mongodb as backend_store.

@georgepsarakis
Copy link
Contributor

@AvnerCohen in that case, having a centralized storage for the scheduler may help.
You could perhaps look into https://github.com/sibson/redbeat/ which uses Redis Sorted Sets to create a priority queue with the pending scheduled tasks, which in your case may result in consolidating multiple cleanup tasks to one or two maybe.

@AvnerCohen
Copy link
Author

Yeah, in my case the solution was just to make sure there is a single beat doing that, and this solved the issue and load on the DB.

I thought this is a general issue that might impact others. If not, suggest to close it.

@auvipy auvipy added this to the Future milestone Oct 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants