Backend_Cleanup Delete fest #5401

AvnerCohen · 2019-03-20T13:32:20Z

Brief Summary

We have a rather large production setup, with pretty large messages stored on a mongo as the backend store.
As part of the "Backend_cleanup" feature we are seeing huge load on our backend, every day at 4:00UTC, because the exact same query is executed by all workers

This was reported in the past at #4335

Proposed Behavior

To resolve this what we have done on our end, is disabled the celery_result_expires on all workers, and enabled it just in one of the workers.
The load was fixed as a result.

What we see on the mongodb is the exact same query being called from hundreds of workers, this results in an unneeded load given that this is a pretty straightforward delete query and the lock/contention on the DB is really unnecessary.

I think the following are options to fix or improve that situation, as I am sure some people see this daily load on their production system without fully knowing what is causing it.

Options to fix:

When using mongo as a backend, remove entries with a mongo TTL index (https://docs.mongodb.com/manual/core/index-ttl/)
The obvious down side is that this is mongo specific (although redis can do the same).
Instead of running the cleanup on 4:00AM sharp, randomize this across 30 minutes, this will make it such that atleast when some of the workers comes to perform the delete, it will already be done by than and the contention will be limited.
Down side is it's a bit of black magic.
Perform a logical LOCK inside the DB such that before performing the deletion some bit will be checked, if it's ON, it means someone is already deleting the data.
The bit can be on a date basis, such that even if something happend and it was left "ON" it will not block deletion in future days.

Happy to provide more context if anything is missing.

thedrow · 2019-03-26T13:21:26Z

MongoDB should definitely use TTLs.
I think that deserves an issue of it's own.
Would you like to provide a PR that implements such a behavior?

I agree that the cleanup task execution time is arbitrary and should at the very least be configurable.
However, as far as I can see the only one running it is celery beat.
I'm not sure how it is possible that it's being executed by multiple workers.

georgepsarakis · 2019-03-27T06:06:27Z

As @thedrow mentioned, only certain backends require cleanup tasks, which are enqueued by Celery Beat.

@AvnerCohen a few questions on your architecture:

are you perhaps running multiple Celery Beat instances?
what broker are you using? If you are using Redis you may have to adjust the visibility timeout setting.

AvnerCohen · 2019-03-27T06:19:32Z

@thedrow @georgepsarakis Indeed, we have tens of Beats.
We are using RabbitMQ as a broker and mongodb as backend_store.

georgepsarakis · 2019-03-27T06:42:45Z

@AvnerCohen in that case, having a centralized storage for the scheduler may help.
You could perhaps look into https://github.com/sibson/redbeat/ which uses Redis Sorted Sets to create a priority queue with the pending scheduled tasks, which in your case may result in consolidating multiple cleanup tasks to one or two maybe.

AvnerCohen · 2019-04-02T13:10:20Z

Yeah, in my case the solution was just to make sure there is a single beat doing that, and this solved the issue and load on the DB.

I thought this is a general issue that might impact others. If not, suggest to close it.

auvipy added the Component: MongoDB Results Backend label Mar 21, 2019

thedrow added Issue Type: Enhancement Component: Celerybeat labels Mar 26, 2019

auvipy added this to the Future milestone Oct 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend_Cleanup Delete fest #5401

Backend_Cleanup Delete fest #5401

AvnerCohen commented Mar 20, 2019 •

edited by sync-by-unito bot

thedrow commented Mar 26, 2019 •

edited

georgepsarakis commented Mar 27, 2019

AvnerCohen commented Mar 27, 2019

georgepsarakis commented Mar 27, 2019

AvnerCohen commented Apr 2, 2019

Backend_Cleanup Delete fest #5401

Backend_Cleanup Delete fest #5401

Comments

AvnerCohen commented Mar 20, 2019 • edited by sync-by-unito bot

Brief Summary

Proposed Behavior

thedrow commented Mar 26, 2019 • edited

georgepsarakis commented Mar 27, 2019

AvnerCohen commented Mar 27, 2019

georgepsarakis commented Mar 27, 2019

AvnerCohen commented Apr 2, 2019

AvnerCohen commented Mar 20, 2019 •

edited by sync-by-unito bot

thedrow commented Mar 26, 2019 •

edited