Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent task iteration support #422

Open
joshbeckman opened this issue May 26, 2021 · 8 comments
Open

Concurrent task iteration support #422

joshbeckman opened this issue May 26, 2021 · 8 comments
Labels
enhancement New feature or request

Comments

@joshbeckman
Copy link

joshbeckman commented May 26, 2021

Over at https://github.com/shopify/flow we have been trying to adopt the maintenance task framework and have enjoyed the benefits for our small data migrations, but our main hangup is the loong runtimes of tasks that need to operate on large datasets (e.g. all records in one table table - tens of thousands now, will be much more the future). When we tried running a recent data migration via maintenance task recently, the total time to execute would have been months.

As such, our main desire with this library would be declarative concurrency support. Is #325 (comment) still the recommendation for concurrency in the future of this library?

No immediate need for action on this - we just wanted to provide feedback on our adoption!

@etiennebarrie
Copy link
Member

Do you think batches enumerators could help? (see #409)
Depending on your tasks, being able to update 100/1000 records at a time could substantially speed them up.

Regarding actual parallelism when running tasks, it's something that we're thinking about but we haven't made any formal plans so we can't make any promise. We can keep this issue open to continue thinking about it, start fleshing out an API, behaviour, figure out the edge cases (e.g. it will require special handling for custom enumerators which may not have a way to start a cursor randomly, but only give out one item at a time), etc.

@joshbeckman
Copy link
Author

Batches could help with some of our task types, yes!

But we have other types of tasks that require, for example, calling an external API with an individual record and then saving that value to our database, so the batching would remove some of the overhead of the job queue itself but wouldn't give us the speed up that we would get from concurrency.

@adrianna-chang-shopify adrianna-chang-shopify added the enhancement New feature or request label May 6, 2022
@sle-c
Copy link

sle-c commented Mar 29, 2023

I recently ran a migration on flow which mainly involves making graphQL requests to core for certain things. Processing 874k rows would take about 7 days to complete. I think allowing parallelism really helps in these cases.

Screenshot 2023-03-29 at 1 14 33 PM

Copy link

This issue has been marked as stale because it has not been commented on in two months.
Please reply in order to keep the issue open. Otherwise, it will close in 14 days.
Thank you for contributing!

@github-actions github-actions bot added the stale label Jan 27, 2024
@joshbeckman
Copy link
Author

We would still like this!

@github-actions github-actions bot removed the stale label Jan 31, 2024
Copy link

This issue has been marked as stale because it has not been commented on in two months.
Please reply in order to keep the issue open. Otherwise, it will close in 14 days.
Thank you for contributing!

@github-actions github-actions bot added the stale label Mar 31, 2024
@joshbeckman
Copy link
Author

We would still really like this

@github-actions github-actions bot removed the stale label Apr 2, 2024
@segiddins
Copy link

This would be incredibly useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants