-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use thread pool for DeleteDanglingObjectStorageFilesStep #216
Conversation
The sync variant will be directly used in some steps. [DDB-951]
This implements `run_step` by running `run_sync_step` in a thread pool, this can be used for CPU intensive steps (which are probably all steps, given a large-enough backup). [DDB-951]
The main goal is to migrate `DeleteDanglingObjectStorageFilesStep` to run in a separate thread instead of keeping astacus stuck while the step is iterating through very long lists of object storage files. `RestoreObjectStorageFilesStep` was also migrated because it used the same APIs as `DeleteDanglingObjectStorageFilesStep`, to avoid keeping two implementations of everything.
There isn't anything async in this module. [DDB-951]
@@ -17,6 +17,7 @@ | |||
from astacus.coordinator.manifest import download_backup_manifest, download_backup_min_manifest | |||
from collections import Counter | |||
from collections.abc import Sequence, Set | |||
from starlette.concurrency import run_in_threadpool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no starlette dependency neither in astacus nor in aiven-core. I can imagine there's a transitive dependency, but if we start using it directly we should declare a direct dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already imported in two other places:
from starlette.concurrency import run_in_threadpool |
from starlette.concurrency import run_in_threadpool |
But yes, I'll add it to our direct dependencies 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right it was already there, it wasn't so urgent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to take a further look, there're quite a few changes - I should be able to do that in a couple of hours. BTW:
since our object storage library is sync
This seems quite a shortcoming and I wonder whether this should be addressed somehow.
It's also true of the zookeeper client, cassandra client, etc. Approximately everything except astacus client/server HTTP is sync code wrapped in threads and async awaited. Given than "doing CPU-intensive work" happens in many place and is hard to diagnose (problems only happen with large backups), I'm leaning towards addressing it by progressively moving everything to threads and getting rid of async in all of astacus, not by trying to find async libraries for everything. We don't really need the theoretical efficiency of an astacus able to answer to thousand of concurrent queries. (By "moving to threads", I mean "one query per thread" + "a few pools for things that are worth parallelizing and currently handled with asyncio.gather", not total chaos and tons of locks.) |
It was already an indirect dependency and already imported directly in a few places, but we should make that explicit.
+1 on this. Async applications are usually efficient when everything is async, with only the occasional reliance on threads. If it's "oh my God it's full of threads" it's probably just easier to do everything in sync, rather than wasting time on making the two "colors" interoperate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
CPU intensive steps running in the main thread block all async operations.
This include other concurrent requests, leaving astacus completely unresponsive.
Start moving some steps entirely in a thread pool to avoid that.
We were already using the thread pool a lot anyway, since our object storage library
is sync, but now instead of entering/leaving the pool for every small operation, we
run an entire step as a single task.
[DDB-951]