Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How many queues can I create? #487

Open
ericdude4 opened this issue Sep 27, 2023 · 10 comments
Open

How many queues can I create? #487

ericdude4 opened this issue Sep 27, 2023 · 10 comments

Comments

@ericdude4
Copy link

Sorry, this is not an issue. I'm just wondering if someone from the Exq core team would be able to advise on how many queues can be created. I imagine this might come down to a Redis limitation? What is the upper limit for the number of queues I can create?

So far, my application is managing ~5,600 queues gracefully. How long can this scale for?

@ananthakumaran
Copy link
Collaborator

Exq polls redis for each and every queue independently using RPOPLPUSH. Let's say your poll_timeout is 50ms, it will poll redis 20 times per second, 20 * 5600 = 112000 ops. Redis usually can handle this, but there is a limit. I think up to 1 million you can easily scale, after that, it might get tricky. I am also talking about the job polling, it does a lot of other things. I would suggest you start looking at Redis ops per second metrics first, you can do a redis-cli monitor (not safe for production, run it locally) to get a rough idea about the commands executed by exq.

I think you are trying to design a queue per user? If the count could go up to 100k, I would suggest not to go this route. Also, Exq implementation itself might not be optimized. Most deployments I have seen are < 100 queues.

@akira
Copy link
Owner

akira commented Sep 27, 2023

Agreed with @ananthakumaran.

There's also some performance tests you can try here and adapt (we did some optimization a while back but not related to this case): https://github.com/akira/exq/blob/master/test/performance_test.exs
In the case you don't need stats, you can also disable those which would reduce the qps on Redis.

@ericdude4
Copy link
Author

@akira @ananthakumaran Thank you for your insights on this. I appologize that I didn't respond sooner, but I have been thinking about this problem, especially as the application continues to grow and place more load on Exq. I am noticing more "gremlins" with regards to jobs being processed predictably. Sometimes, I notice that a queue which is subscribed to, simply doesn't execute the jobs within.

I have the following thought for a potential workaround to buy me some more time. Basically, out of the thousands of queues which exist in the application, only around 10 - 20 ever have any jobs queued up at a given time. I'm thinking that I can "prune" the queues which don't have any jobs every 5 minutes. Then, when I need to queue up a job for that client later on, the application will create the new queue and subscribe to it dynamically.

I expected I could make the "prune" function as follows:

def prune() do
  {:ok, queues} = Exq.Api.queue_size(Exq.Api)

  Enum.each(queues, fn {queue, jobs} ->
    if jobs == 0 do
      Exq.Api.remove_queue(Exq.Api, queue)
    end
  end)
end

The problem I am facing here though, is that redis-cli monitor still shows RPOPLPUSH commands being run for the queue, even after the queue has been removed. I also tried this with Exq.unsubscribe Exq.Api, "queue-name" but found the same result.

It seems like the queues remain in the Exq cache even after they are removed, causing the application to continue executing the RPOPLPUSH somehow.

@ericdude4
Copy link
Author

Okay, quick update. After switching the above code to use Exq.unsubscribe Exq.Api, "queue-name", it seems to be working much better. However, I'm curious if you have any insight as to why this might be a bad idea? seems like I can keep my qps much lower on average if I run this worker every few minutes.

@ananthakumaran
Copy link
Collaborator

ananthakumaran commented Mar 20, 2024

Exq.Api.remove_queue(Exq.Api, queue)

This is not necessary and might delete actual jobs. unsubscribe is all you need, though it needs to be run on all worker nodes

if I run this worker every few minutes.

From what I understand, you are delaying the job execution (after enqueue) by a few minutes for queues with infrequent jobs. If this is ok, then unsubscribe might work.

@ericdude4
Copy link
Author

Hey @ananthakumaran, thank you for your thoughtful response. I enqueue jobs for users immediately based on incoming webhooks, with the following logic:

def enqueue_job(user) do
  # Check for an existing subscription for this user.
  {:ok, existing_subscriptions} = Exq.subscriptions(Exq)

  subscription_already_exists? = Enum.find(existing_subscriptions, &(&1 == user.name))

  unless subscription_already_exists? do
    # Subscribe to the queue if a subscription is not already in place.
    Exq.subscribe(Exq, user.name)
  end

  # Enqueue the job immediately, with the subscription is in place.
  Exq.enqueue(Exq, user.name, Foo.Worker, [])
end

def prune() do
  # Get list of queues along with their queue size
  {:ok, queues} = Exq.Api.queue_size(Exq.Api)

  {:ok, subscriptions} = Exq.subscriptions(Exq)

  Enum.each(queues, fn {queue, jobs} ->
    if queue in subscriptions and jobs == 0 do
      # If there are no jobs in the subscribed queue, unsubscribe from that queue
      Exq.unsubscribe(Exq, queue)
    end
  end)
end

The prune() worker gets run every 5 minutes, removing all queues which don't have any jobs. Enqueuing a job always checks for the presence of a queue, creating it if it doesn't exist.

I ran some tests and this really improved things a lot, since 99% of the queues which were created dynamically have 0 pending jobs at any given moment. This pruning approach keeps the ~10,000 subscriptions which were in place previously (making 20 RPOPLPUSH requests per second for each subscription) down below 100 - 200 subscriptions on average.

With this in mind, do you see anything which I might have failed to consider? Once again, I really appreciate your thoughts.

Eric

@ananthakumaran
Copy link
Collaborator

If you know when jobs are getting enqueued, then this approach would work, though I haven't given much thought about how it would play with multiple worker nodes.

Exq also has a Dequeue behaviour which can be overridden. This was added to support rate limiter (see https://github.com/ananthakumaran/exq_limit), you might be able to re-purpose it for your use case.

@nicnilov
Copy link

Please forgive my ignorance, but what is the reason Exq uses polling as opposed to Pub/Sub?

@ananthakumaran
Copy link
Collaborator

Pub/Sub is not persistent. There is a blocking variant of rpoplpush called brpoplpush, but it doesn't support multiple lists redis/redis#1785. Exq also needs to be compatible with Sidekiq, which restricts what data structures are allowed

If I were to write a Job processing library today, I would use redis streams, but that's not possible with Exq due to Sidekiq compatibility.

@nicnilov
Copy link

That makes sense, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants