You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a model ensemble with output shape from some models with different shapes. I have multiple clients but each one sends a single frame per inference request. For instance, I got a bytetracker that produces a different shapes. But it looks like Triton is not batching and sending a single input/output per inference request and keep caching.
My questions are:
Should I remove dynamic batching? since each client sends a single frame per request or keep both dynamic_batching and ragged_batching?
The batch_input config should only be to inputs to models right? for instance if I have a model produces a different shapes I won't add something like batch_input or even batch_output? If so, should I keep a -1 in that output?
Should I output a batch of 1 from each model so triton can concatenate? for instance, a bytetracker outputs (batch_size, num_det, 56) or should it be ( num_det, 56) and triton will batch?
Triton version FROM nvcr.io/nvidia/tritonserver:24.01-py3
I have a model ensemble with output shape from some models with different shapes. I have multiple clients but each one sends a single frame per inference request. For instance, I got a bytetracker that produces a different shapes. But it looks like Triton is not batching and sending a single input/output per inference request and keep caching.
My questions are:
Triton version FROM nvcr.io/nvidia/tritonserver:24.01-py3
This is a sample from bytetracker for instance:
This is a sample from post processing where outputs are not ragged_batching:
The text was updated successfully, but these errors were encountered: