Consumer fetch request priority #5213

bruth · 2024-03-13T13:27:23Z

Proposed change

Introduce support for a priority value (an integer) that can be (optionally) set on a consumer fetch request that the server then uses to determine which requests to serve messages to.

For example, if three requests with priority 1 and three requests with priority 0 are pending, the priority 1 requests will be fulfilled first before priority 0 (and assuming no new requests with priority 1 interleave).

Use case

There are two general use cases:

Cross-region shared work queue streams where priority of requests should be serviced by clients/workers local to that work queue stream, but workers in other regions are willing to fetch messages if there is available capacity and the local region is saturated
Workers/clients that have more resources, better hardware, etc. can be prioritized over other ones

Contribution

cc @jnmoyne

ripienaar · 2024-03-13T15:48:50Z

In the "better" hardware scenario this make sense to me.

I wonder about the region case though, imagine 3 regions and a stream+consumer in the first. Clients would need knowledge of distance to the consumer and then that knowledge needs to be coordinated among them to figure out who is priority 1,2 or 3 - all clients connected to the 2nd nearest cluster share the same priority. But what if my RTT to the cluster went high, it would need to be dynamic based on my RTT to the cluster AND the clusters RTT to the consumere somehow I dont think it would be elegant to hard configure those priorities... did you want to hard coding those priorities or someting else?

jnmoyne · 2024-03-13T16:57:39Z

In the use case this would be required for they are ok with the priority to be hard coded because they know which data center (cluster) the client application doing the fetch is located and which data center(s) the stream(s) they are fetching on are located. Sure the actual RTT between two data centers could fluctuate a bit over time but they don't care about that because they can estimate what that RTT is normally and use that to determine 'how far' the data center is. What's important is that the local fetchers use the highest priority and the 'remote' fetchers use a lower priority, they do not want to adjust the priority dynamically according to the current RTT, the priorities are set 'administratively'.

For example there's an LA data center and a NY data center. (all the) Clients in LA fetch with priority 1 from the stream located in LA and with priority 0 from the stream located in NY, and vice-versa (all the) clients in NY would fetch from the NY stream with priority 1 and from the LA stream with priority 0

ripienaar · 2024-03-13T16:59:42Z

It sounds a bit too manual and human driven to me tbh as a general solution

ripienaar · 2024-03-13T17:01:05Z

I mean the immediate comparison here is queue groups, imagine how much queue groups would suck if it was all manually configured?

derekcollison · 2024-03-13T18:59:27Z

In this case would push queue groups make sense?

ripienaar · 2024-03-13T19:03:35Z

Yeah that's a comment I had earlier elsewhere, but be nice to get a pull equivelant that isnt just completely up to users to manaully build out - but of course good to support the manual case for the different hardware type scenario.

Does the shared info cross the import include the client RTT? That might translate exactly to priority - higher RTT lower priority - but doesnt consider cluster rtt

derekcollison · 2024-03-13T19:33:20Z

It has it yes, but that is for the client connection only, not the full RTT to reach the consumer leader.

derekcollison · 2024-03-13T19:34:11Z

It also has cluster I think, so maybe those two combined..

ripienaar · 2024-03-13T19:35:37Z

Worth some exploration for sure - if fetch has a priority take that. Else of consumer is set to suto prioritise use that combo.

We know of users have to configure these things the only valid assumption is that it’s done wrong :)

ripienaar · 2024-03-13T21:33:11Z

Curious though is this really necessary?

Do we service the waiting pulls in random order? I thought we do that to avoid a bad client doing a huge pull and then just letting ack timers expire that could DOS a consumer.

Assuming we service the waiting pulls in random order this is achievable by adjusting the rate of pulls and size of batches. Pull fewer messages less frequently and over time you get a fraction of total messages.

derekcollison · 2024-03-13T21:55:50Z

We do FIFO but allocate 1 per request even if batch is > 1.

jnmoyne · 2024-03-14T04:32:38Z

Use case is you have a bunch of clients in each data center pulling to get messages (work requests) from the working queue stream for that data center. But if there are a lot of messages to process in the stream they want to allow clients located in other data centers that are not busy and waiting for new messages to land in their data center's working queue to "steal" work from this data center.

Hence clients in the data center pull from the working queue stream in their own data center with a high priority and as long as they can keep up with the ingress of messages they are the ones mostly getting those messages. But if they are busy and messages are coming into the stream faster than they pull them then the other clients from the other data centers are are pulling with lower priority would start getting some of those messages delivered to them.

(This idea here was presented to them as a possible way to implement that use case and they agreed it would work for them but if we come up with another better way to do what they want then even better. This idea seemed a simple and doable relatively easily (and backwards compatible as client applications that know nothing about this priority would just end up using priority 0)).

Jarema · 2024-03-14T07:56:25Z

Maybe I'm wrong here, but for that mechanism to work correctly, the high-priority clients have to be very conservative in sending pull requests. If they will always have pending Pull Request, which is the most common use case, the lower priorities will never get messages anyway.

The only way it will work reliably, is if it's used with fetch of 1 messgage, and the fetch is send only after the previous one finished.

This is pretty inefficient way to interact with Pull Consumers that also causes zig-zags in network traffic.
I think this solution is really easy to misue.

The use case @jnmoyne mentions maybe needs more like a watermark solution, where specific pull request can set watermark of pending messages, above which it will be served?

jnmoyne · 2024-03-14T15:31:11Z

In this particular use case indeed there's lots (hundreds, up to thousands) of client application in each data center each one wanting to fetch indeed just one message which takes a few seconds to process, and then go back to fetching one message. They are not trying to spread the distribution of messages proportionally to the priority level such that lower priority fetchers still get some messages, rather the high priority fetchers that are waiting for new messages should always get the messages when they come in. The lower priority fetchers should always be 'starved' as long as there are some high priority fetch requests waiting and only get a new message if at the time there is no higher priority fetcher waiting.

Jarema · 2024-03-15T09:21:21Z

I understand the need, but I really do not like the idea that such an important feature works only if there is very specific setup done by the users.

MauriceVanVeen · 2024-03-15T09:44:25Z

I like the proposal, having clients indicate themselves if they have prio1 to receive the data, and if they are willing to have lower prio2 clients take up the work if prio1 is saturated.

Am thinking the priority alone might be a bit too manual, and would agree with @Jarema there. The question I would ask myself as an end-user of this: "If I have clients having prio1 and some prio2, when can I expect prio2 clients to receive work?". Currently that sounds like it would be specific to the setup, being tied to the implementation in the server as well.

I'm thinking a slight addition to the priorities might make the setup a bit less specific, as well as having a more clear answer to "when can lower prio clients expect to receive work".

What if we'd be able to indicate both prio and some kind of "timeout". An example:
prio1 clients are constantly pulling new messages and they indicate they are willing to give prio2 clients work if prio1 clients have not picked up anything in 10 seconds for example.
From the prio2 clients perspective it will be clear they will not get any data unless no prio1 clients could pick up work in those 10 seconds.

Then the server would always hold on to the highest priority that last pulled as well as a timeout. When the timeout is reached the priority can be lowered to see if other clients are available.

bruth added the proposal Enhancement idea or proposal label Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consumer fetch request priority #5213

Consumer fetch request priority #5213

bruth commented Mar 13, 2024

ripienaar commented Mar 13, 2024

jnmoyne commented Mar 13, 2024 •

edited

ripienaar commented Mar 13, 2024

ripienaar commented Mar 13, 2024

derekcollison commented Mar 13, 2024

ripienaar commented Mar 13, 2024 •

edited

derekcollison commented Mar 13, 2024

derekcollison commented Mar 13, 2024

ripienaar commented Mar 13, 2024

ripienaar commented Mar 13, 2024 •

edited

derekcollison commented Mar 13, 2024

jnmoyne commented Mar 14, 2024 •

edited

Jarema commented Mar 14, 2024

jnmoyne commented Mar 14, 2024

Jarema commented Mar 15, 2024

MauriceVanVeen commented Mar 15, 2024

Consumer fetch request priority #5213

Consumer fetch request priority #5213

Comments

bruth commented Mar 13, 2024

Proposed change

Use case

Contribution

ripienaar commented Mar 13, 2024

jnmoyne commented Mar 13, 2024 • edited

ripienaar commented Mar 13, 2024

ripienaar commented Mar 13, 2024

derekcollison commented Mar 13, 2024

ripienaar commented Mar 13, 2024 • edited

derekcollison commented Mar 13, 2024

derekcollison commented Mar 13, 2024

ripienaar commented Mar 13, 2024

ripienaar commented Mar 13, 2024 • edited

derekcollison commented Mar 13, 2024

jnmoyne commented Mar 14, 2024 • edited

Jarema commented Mar 14, 2024

jnmoyne commented Mar 14, 2024

Jarema commented Mar 15, 2024

MauriceVanVeen commented Mar 15, 2024

jnmoyne commented Mar 13, 2024 •

edited

ripienaar commented Mar 13, 2024 •

edited

ripienaar commented Mar 13, 2024 •

edited

jnmoyne commented Mar 14, 2024 •

edited