Dynamic batching that supports static batch size with padding #7124

ShuaiShao93 · 2024-04-17T01:49:01Z

Is your feature request related to a problem? Please describe.
Since TensorRT has limited support for dynamic shape, the dynamic batch size required by dynamic batcher is not very ideal.

Describe the solution you'd like
Support padding batch size to the static batch size when there is not sufficient amount of data.

SunnyGhj · 2024-04-17T07:47:13Z

Great minds think alike, I'm trying to manually implement padding size from the request side

ShuaiShao93 · 2024-04-17T15:55:52Z

Great minds think alike, I'm trying to manually implement padding size from the request side

Does this mean you disabled dynamic batching on triton? This is not ideal, because one of the most important reasons for us to use Triton is dynamic batching

SunnyGhj · 2024-04-17T17:14:52Z

when there is not sufficient amount of data.

Similarly, we have manually implemented batch requests on the client and fixed the batch size to static batch size. We are trying to padding the data that is not sufficient amount.

ShuaiShao93 · 2024-04-17T17:28:08Z

when there is not sufficient amount of data.

Similarly, we have manually implemented batch requests on the client and fixed the batch size to static batch size. We are trying to padding the data that is not sufficient amount.

Ok, it sounds like you re-implemented the dynamic batcher at your own client, which is probably not the best investment of time. I hope Triton can support this natively. But thanks for sharing this!

Tabrizian · 2024-04-19T18:55:07Z

I think this enhancement makes sense. @GuanLuo / @nnshah1 any additional thoughts?

nnshah1 · 2024-04-19T20:41:27Z

@ShuaiShao93 If I understand correctly - the idea here is to have a static batch defined in the engine but then have the dynamic batcher pad if it sends in batches with smaller size?

Is that something to handle in the server or in the backend? It might be more efficient to pad right before sending it to the engine.

ShuaiShao93 · 2024-04-19T21:24:46Z

@nnshah1 how is this possible?

Let's say a model has static batch size = 8. There are two clients, client A has a request of batch size 4, client B has a request of batch size 3.

Ideally, if A and B call triton server at the same time, dynamic batcher makes a batch of size 7, then pads it to 8.

But if we pad at client, which means A pads 4 to 8 and B pads 3 to 8, we need to run inference twice, which doubles the cost

nnshah1 · 2024-04-19T21:26:28Z

@nnshah1 how is this possible?

Let's say a model has static batch size = 8. There are two clients, client A has a request of batch size 4, client B has a request of batch size 3.

Ideally, if A and B call triton server at the same time, dynamic batcher makes a batch of size 7, then pads it to 8.

But if we pad at client, which means A pads 4 to 8 and B pads 3 to 8, we need to run inference twice, which doubles the cost

No I get your point - I mean to pad in the TRT backend vs the core server piece - not to pad at the client.

nnshah1 · 2024-04-19T21:29:28Z

As a kind of example for our stable diffusion tutorial - I ended up padding / splitting on the model side and allowing the dynamic batcher to provide batches independent of that. (this is just an example and would need to be implement in the TRT engine or triton core)

https://github.com/triton-inference-server/tutorials/blob/cb2ca257000cd14d59642a7aa86b56d054535d73/Popular_Models_Guide/StableDiffusion/backend/diffusion/model.py#L178

ShuaiShao93 · 2024-04-19T21:32:13Z

@nnshah1 Ah gotcha. Thanks! Either should work, but sounds better to make this a general feature, and make it a flag in config, in case other backends also want static batch size.

Tabrizian added enhancement New feature or request module: server Issues related to the server core and frontends labels Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic batching that supports static batch size with padding #7124

Dynamic batching that supports static batch size with padding #7124

ShuaiShao93 commented Apr 17, 2024

SunnyGhj commented Apr 17, 2024

ShuaiShao93 commented Apr 17, 2024

SunnyGhj commented Apr 17, 2024 •

edited

ShuaiShao93 commented Apr 17, 2024

Tabrizian commented Apr 19, 2024

nnshah1 commented Apr 19, 2024

ShuaiShao93 commented Apr 19, 2024 •

edited

nnshah1 commented Apr 19, 2024

nnshah1 commented Apr 19, 2024

ShuaiShao93 commented Apr 19, 2024

Dynamic batching that supports static batch size with padding #7124

Dynamic batching that supports static batch size with padding #7124

Comments

ShuaiShao93 commented Apr 17, 2024

SunnyGhj commented Apr 17, 2024

ShuaiShao93 commented Apr 17, 2024

SunnyGhj commented Apr 17, 2024 • edited

ShuaiShao93 commented Apr 17, 2024

Tabrizian commented Apr 19, 2024

nnshah1 commented Apr 19, 2024

ShuaiShao93 commented Apr 19, 2024 • edited

nnshah1 commented Apr 19, 2024

nnshah1 commented Apr 19, 2024

ShuaiShao93 commented Apr 19, 2024

SunnyGhj commented Apr 17, 2024 •

edited

ShuaiShao93 commented Apr 19, 2024 •

edited