Casting NumPy string array to np_utils.Tensor disproportionately increases latency #7153

LLautenbacher · 2024-04-24T15:07:35Z

Description
Casting a NumPy string array to np_utils.Tensor using the python backend causes a disproportionate increase in latency (~300x).

Triton Information
nvcr.io/nvidia/tritonserver:23.05-py3
This also still happens in 24.03.

To Reproduce
When using the model and config below I get a latency of 9873 usec using the perf_analyzer. Uncommenting the line pb_utils.Tensor("annotation", arr_s) causes the latency to increase to 2888440 usec. Creating the NumPy array doesn't seem to matter only casting it to a tensor is what causes the slowdown.

model.py

import triton_python_backend_utils as pb_utils
import numpy as np
import json


class TritonPythonModel:
    def initialize(self, args):
        self.model_config = json.loads(args["model_config"])
        output0_config = pb_utils.get_output_config_by_name(
            self.model_config, "annotation"
        )
        self.output_dtype = pb_utils.triton_string_to_numpy(output0_config["data_type"])

    def execute(self, requests):
        responses = []
        for request in requests:
            batchsize = (
                pb_utils.get_input_tensor_by_name(request, "input0").as_numpy().shape[0]
            )
            arr_s = np.empty((batchsize, 256), dtype=np.dtype("S5"))
            arr_f = np.empty((batchsize, 256), dtype=np.dtype("float64"))
            # pb_utils.Tensor("annotation", arr_s)
            t = pb_utils.Tensor("annotation", arr_f)
            responses.append(pb_utils.InferenceResponse(output_tensors=[t]))
        return responses

    def finalize(self):
        pass

max_batch_size: 1000
input [
  {
    name: 'input0',
    data_type: TYPE_INT32,
    dims: [1],
  }
]
output [
 {
   name: 'annotation',
   data_type: TYPE_FP64,
   dims: [174]
 }
]

Expected behavior
Returning a string array shouldn't take 300x as long as a float array.

The text was updated successfully, but these errors were encountered:

rmccorm4 · 2024-05-01T00:08:34Z

Hi @LLautenbacher, thanks for raising this issue with such detail.

@Tabrizian @krishung5 may be able to chime in here.

Is is possible this commented line is causing an extra copy? Also, can you elaborate on this datatype np.dtype("S5")? Is it required, and do you see different behavior if you use something like np.object_ instead?

LLautenbacher · 2024-05-01T14:54:33Z

Thank you for looking into this!

The specific string datatype is not relevant. U S and O all show this behaviour.

rmccorm4 added question Further information is requested module: backends Issues related to the backends labels May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Casting NumPy string array to np_utils.Tensor disproportionately increases latency #7153

Casting NumPy string array to np_utils.Tensor disproportionately increases latency #7153

LLautenbacher commented Apr 24, 2024 •

edited

rmccorm4 commented May 1, 2024

LLautenbacher commented May 1, 2024

Casting NumPy string array to np_utils.Tensor disproportionately increases latency #7153

Casting NumPy string array to np_utils.Tensor disproportionately increases latency #7153

Comments

LLautenbacher commented Apr 24, 2024 • edited

rmccorm4 commented May 1, 2024

LLautenbacher commented May 1, 2024

LLautenbacher commented Apr 24, 2024 •

edited