Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch POC #142

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Batch POC #142

wants to merge 7 commits into from

Conversation

jpcamara
Copy link
Contributor

@jpcamara jpcamara commented Feb 2, 2024

Proof of concept support for batches in SolidQueue!

Batches are a powerful feature in Sidekiq Pro and GoodJob which help with job coordination. A "Batch" is a collection of jobs, and when those jobs meet certain completion criteria it can optionally trigger another job with the batch record as an argument.

The goal of this feature is to:

  1. Enhance job coordination
  2. Maintain the overall simplicity of SolidQueue. To quote @rosa in the batch support issue:

We have batch support in our list of possible features to add, but it's not in the immediate plans because it's a bit at odds with the simplicity we're aiming for... I'm not quite sure yet how it could look like, in a way that maintains the overall simplicity of the gem

This PR provides a basic batch implementation, similar in interface to the GoodJob version of batches. The happy path for batches is functional, so the following example will work:

class SleepyJob < ApplicationJob
  queue_as :background

  def perform(seconds_to_sleep)
    Rails.logger.info "Feeling #{seconds_to_sleep} seconds sleepy..."
    sleep seconds_to_sleep
  end
end

class BatchCompletionJob < ApplicationJob
  queue_as :background

  def perform(batch)
    Rails.logger.info "#{batch.jobs.size} jobs completed!"
  end
end

SolidQueue::JobBatch.enqueue(on_finish: BatchCompletionJob) do
  5.times.map { |i| SleepyJob.perform_later(i) }
end

You should see the following in your logs:

[SolidQueue] Claimed 5 jobs
Performing SleepyJob (Job ID: 9a97e394-f1b4-47b5-8db3-6df86faf0913) from SolidQueue(background) enqueued at 2024-02-02T02:02:05.563284000Z with arguments: 1
Feeling 1 seconds sleepy...
Performing SleepyJob (Job ID: 4ae76794-25e1-4378-bbc7-c505d7775395) from SolidQueue(background) enqueued at 2024-02-02T02:02:05.567448000Z with arguments: 2
Feeling 2 seconds sleepy...
Performing SleepyJob (Job ID: fd41b240-08b6-4f7d-b4de-38ee2e6a58b4) from SolidQueue(background) enqueued at 2024-02-02T02:02:05.557246000Z with arguments: 0
Feeling 0 seconds sleepy...
Performed SleepyJob (Job ID: fd41b240-08b6-4f7d-b4de-38ee2e6a58b4) from SolidQueue(background) in 0.1ms
Performing SleepyJob (Job ID: 40498973-9b5a-4f57-acce-03e2b8e2a97c) from SolidQueue(background) enqueued at 2024-02-02T02:02:05.594398000Z with arguments: 4
Feeling 4 seconds sleepy...
Performing SleepyJob (Job ID: c2a123ea-61ba-4d33-99e7-9478a8ca4f5d) from SolidQueue(background) enqueued at 2024-02-02T02:02:05.589439000Z with arguments: 3
Feeling 3 seconds sleepy...
Performed SleepyJob (Job ID: 9a97e394-f1b4-47b5-8db3-6df86faf0913) from SolidQueue(background) in 1005.47ms
Performed SleepyJob (Job ID: 4ae76794-25e1-4378-bbc7-c505d7775395) from SolidQueue(background) in 2006.58ms
Performed SleepyJob (Job ID: c2a123ea-61ba-4d33-99e7-9478a8ca4f5d) from SolidQueue(background) in 3006.1ms
Performed SleepyJob (Job ID: 40498973-9b5a-4f57-acce-03e2b8e2a97c) from SolidQueue(background) in 4008.12ms
[SolidQueue] Claimed 1 jobs
Performing BatchCompletionJob (Job ID: 625ca70c-4e80-4789-988b-3950ed1b23f3) from SolidQueue(background) enqueued at 2024-02-02T02:02:10.183075000Z with arguments: #<GlobalID:0x000000010b5b0390 @uri=#<URI::GID gid://dummy/SolidQueue::JobBatch/13>>
5 jobs completed!
Performed BatchCompletionJob (Job ID: 625ca70c-4e80-4789-988b-3950ed1b23f3) from SolidQueue(background) in 7.97ms

As a full interface, including things not implemented yet, this is the concept (mainly, it shows all possible callbacks being registered, as well as batch helpers for getting successes, failures, discards):

SolidQueue::JobBatch.enqueue(
  on_finish: BatchCompletionJob.new.set(wait_until: 1.hour.from_now, queue: :batch),
  on_success: BatchSuccessJob,
  on_discard: BatchDiscardJob
) do
  5.times.map { |i| SleepyJob.perform_later(i) }
end

class BatchCompletionJob < ApplicationJob
  queue_as :background

  def perform(batch)
    Rails.logger.info "#{batch.jobs.size} jobs completed!"
    Rails.logger.info "#{batch.successes.size} jobs succeeded!"
    Rails.logger.info "#{batch.failures.size} jobs failed!"
    Rails.logger.info "#{batch.discarded.size} jobs discarded!"
  end
end

Here are the things that are open questions and missing implementation details:

  • Naming: is JobBatch the right name? General feedback on naming in the feature
  • Is it simple enough?
  • The current implementation only handles jobs that have finished successfully, but it proposes an interface that supports "success", "finished" and "discard", following the semantics of the GoodJob callbacks. Are these sensible for SolidQueue as well?:

:finish - Enqueued when all jobs in the batch have finished, after all retries. Jobs will either be discarded or succeeded.
:success - Enqueued only when all jobs in the batch have finished and succeeded.
:discard - Enqueued immediately the first time a job in the batch is discarded.

  • Probably should allow setting a job for each callback type (only allows for one atm)
  • How do we concretely identify failed jobs that are done retrying?
  • How do we concretely identify failed jobs that have been discarded?
  • Support additional arguments when enqueueing the batch completion job, as well as things like queue/priority, etc (this is handled by allowing the callback to be of the form ActiveJob.new(args).set(options))
  • Keeping things efficient when you have tons of jobs in a batch
  • How would/could batches fit into Mission Control - Jobs?
  • All other feedback 🙇🏼

@jpcamara jpcamara mentioned this pull request Feb 2, 2024
@mbajur
Copy link

mbajur commented Feb 2, 2024

Could that also support adding jobs to already existing batch? Like SleepyJob enqueuing another job that would also be added to the batch.

@jpcamara
Copy link
Contributor Author

jpcamara commented Feb 2, 2024

Could that also support adding jobs to already existing batch? Like SleepyJob enqueuing another job that would also be added to the batch.

@mbajur Definitely. As long as you're in a job that's part of the batch, adding another job to the batch would work fine. It'd be pretty simple to extend the existing code to handle that - something like this would solve your use-case I think?

class SleepyJob < ApplicationJob
  queue_as :background

  def perform(seconds_to_sleep)
    Rails.logger.info "Feeling #{seconds_to_sleep} seconds sleepy..."
    sleep seconds_to_sleep

    batch.enqueue { AnotherJob.perform_later }
  end
end

I can update the ActiveJob::JobBatchId to add an extra batch method to the job which returns the current batch based on the job batch_id, and add an instance level SolidQueue::JobBatch#enqueue method which let's you add more jobs to the batch.

This would only work safely inside of the job - if you were outside of the job, it's possible the batch would finish before the job gets created.

@mbajur
Copy link

mbajur commented Feb 2, 2024

Yes that would absolutely do the trick for me :) Thank you!

* Introduces a "batch" concept, similar to batches present in Sidekiq Pro and GoodJob

* Batches monitor a set of jobs, and when those jobs are completed can fire off a final job

* This introduces a SolidQueue::JobBatch model, as well as the ability to enqueue jobs and associate them with the batch

* There are still more ideas to figure out, but this provides a basic batch scaffolding to spark discussion
* This means we can store the arguments and settings by letting the user do `BatchJob.new(arguments).set(options)`

* Yield the batch in `enqueue` in case someone needs info from it

* When you serialize then deserialize an activejob instance, the arguments are in the serialized_arguments field and can only be transferred over by the private method `deserialize_arguments_if_needed`. This is pretty janky, so there is probably something i'm missing

* `perform_all_later` let's us do a perform_later even with instance, which does not seem to be possible on the instances themselves
* Add spec for adding arguments and options to the batch callback
@jpcamara
Copy link
Contributor Author

jpcamara commented Mar 30, 2024

Hi @rosa 👋🏼 Congrats on getting SolidQueue past incubation and under the Rails umbrella officially!

I'm sure you've got alot on your plate! Are there any questions I can answer in regards to this PR? I can take the interface/functionality further, but I wanted to discuss it a bit before doing that. If there's anything additional you'd like me to tighten up/try out before discussing it, i'm happy to do so.

Also ok to just be on hold and not ready to discuss this further atm. Since it's been a couple months, I figured i'd check in.

@rosa
Copy link
Member

rosa commented Apr 12, 2024

Hey @jpcamara, so sorry for the delay and the silence here. I just haven't had the proper time to dedicate to this, and I think this requires and deserves more than a quick look. Thank you so much for putting this together!

My instinct with this kind of feature is that they require someone to use them before they're ready. From just looking at the code, I'm not quite sure what kind of edge cases and race conditions could arise here. This is how most of Solid Queue has been built: we've used it in HEY before making it "official" for everyone, seeing how it behaves under certain loads and what kind of problems we encounter. We caught and improved quite a few things that way.

We don't have a use case for batches right now, so I'm afraid I won't be able to take this feature to this "production-test" point on my side. Do you see yourself using this in a production setting?

@jpcamara
Copy link
Contributor Author

Hey @jpcamara, so sorry for the delay and the silence here. I just haven't had the proper time to dedicate to this, and I think this requires and deserves more than a quick look. Thank you so much for putting this together!

My instinct with this kind of feature is that they require someone to use them before they're ready. From just looking at the code, I'm not quite sure what kind of edge cases and race conditions could arise here. This is how most of Solid Queue has been built: we've used it in HEY before making it "official" for everyone, seeing how it behaves under certain loads and what kind of problems we encounter. We caught and improved quite a few things that way.

We don't have a use case for batches right now, so I'm afraid I won't be able to take this feature to this "production-test" point on my side. Do you see yourself using this in a production setting?

That makes sense! It was a bit of a chicken and an egg issue for me - I wanted to have batches in SolidQueue before starting to transition some things over, because I have code using Sidekiq Pro batches. But I can start experimenting with it now and report back. I'll continue to work on this PR as well in that case, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants