You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
Casting a NumPy string array to np_utils.Tensor using the python backend causes a disproportionate increase in latency (~300x).
Triton Information
nvcr.io/nvidia/tritonserver:23.05-py3
This also still happens in 24.03.
To Reproduce
When using the model and config below I get a latency of 9873 usec using the perf_analyzer. Uncommenting the line pb_utils.Tensor("annotation", arr_s) causes the latency to increase to 2888440 usec. Creating the NumPy array doesn't seem to matter only casting it to a tensor is what causes the slowdown.
Is is possible this commented line is causing an extra copy? Also, can you elaborate on this datatype np.dtype("S5")? Is it required, and do you see different behavior if you use something like np.object_ instead?
Description
Casting a NumPy string array to np_utils.Tensor using the python backend causes a disproportionate increase in latency (~300x).
Triton Information
nvcr.io/nvidia/tritonserver:23.05-py3
This also still happens in 24.03.
To Reproduce
When using the model and config below I get a latency of 9873 usec using the
perf_analyzer
. Uncommenting the linepb_utils.Tensor("annotation", arr_s)
causes the latency to increase to 2888440 usec. Creating the NumPy array doesn't seem to matter only casting it to a tensor is what causes the slowdown.model.py
Expected behavior
Returning a string array shouldn't take 300x as long as a float array.
The text was updated successfully, but these errors were encountered: