Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ragged Batching is not working proberbly #7206

Closed
AWallyAllah opened this issue May 10, 2024 · 0 comments
Closed

Ragged Batching is not working proberbly #7206

AWallyAllah opened this issue May 10, 2024 · 0 comments

Comments

@AWallyAllah
Copy link

AWallyAllah commented May 10, 2024

I have a model ensemble with output shape from some models with different shapes. I have multiple clients but each one sends a single frame per inference request. For instance, I got a bytetracker that produces a different shapes. But it looks like Triton is not batching and sending a single input/output per inference request and keep caching.

My questions are:

  1. Should I remove dynamic batching? since each client sends a single frame per request or keep both dynamic_batching and ragged_batching?
  2. The batch_input config should only be to inputs to models right? for instance if I have a model produces a different shapes I won't add something like batch_input or even batch_output? If so, should I keep a -1 in that output?
  3. Should I output a batch of 1 from each model so triton can concatenate? for instance, a bytetracker outputs (batch_size, num_det, 56) or should it be ( num_det, 56) and triton will batch?

Triton version FROM nvcr.io/nvidia/tritonserver:24.01-py3

This is a sample from bytetracker for instance:

name: "detection_with_ppe_bytetracker"
backend: "python"
max_batch_size: 4

input [
{
    name: "ppe_bytetracker_input"
    data_type: TYPE_FP16
    dims: [ -1, 57 ]
    allow_ragged_batch: true
}
]
batch_input [
  {
    kind: BATCH_ACCUMULATED_ELEMENT_COUNT
    target_name: "INDEX"
    data_type: TYPE_FP32
    source_input: "ppe_bytetracker_input"
  }
]

input [
{
    name: "detection_bytetracker_input"
    data_type: TYPE_FP16
    dims: [-1, 6]
    allow_ragged_batch: true
}
]
batch_input [
  {
    kind: BATCH_ACCUMULATED_ELEMENT_COUNT
    target_name: "INDEX"
    data_type: TYPE_FP32
    source_input: "detection_bytetracker_input"
  }
]

input [
{
    name: "detection_with_ppe_camera_id"
    data_type: TYPE_FP16
    dims: [ 1 ]
}
]

output [
{
    name: "detection_with_ppe_bytetracker_output"
    data_type: TYPE_FP16
    dims: [ -1, 58 ]
}
]

dynamic_batching {}

instance_group [
    {
        count: 1
        kind: KIND_CPU
    }
]

optimization {
  execution_accelerators {
    cpu_execution_accelerator : [{
      name : "openvino"
    }]
  }
}

This is a sample from post processing where outputs are not ragged_batching:

name: "ppe_yolo_postprocessing"
backend: "python"
max_batch_size: 4

input [
{
    name: "ppe_yolo_postprocessing_input"
    data_type: TYPE_FP32
    dims: [56, 20160]
}
]

output [
{
    name: "ppe_yolo_postprocessing_output"
    data_type: TYPE_FP16
    dims: [ -1, 57 ]
}
]

dynamic_batching {}

instance_group [
    {
        count: 1
        kind: KIND_CPU
    }
]

optimization {
  execution_accelerators {
    cpu_execution_accelerator : [{
      name : "openvino"
    }]
  }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant