Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support user-defined batch inference logic #123

Open
dgcnz opened this issue Apr 26, 2023 · 0 comments
Open

Support user-defined batch inference logic #123

dgcnz opened this issue Apr 26, 2023 · 0 comments

Comments

@dgcnz
Copy link

dgcnz commented Apr 26, 2023

Describe the feature you'd like

Currently, TorchServe's batch inference is handled by looping through the requests and feeding them individually to the user-defined transform function (#108). However, this doesn't take full advantage of GPU's parallelism and compute power, thus yielding slower endpoints with low resource usage.

On the other hand, TorchServe's documentation on batch inference, shows an example where the developer handles this logic and feeds the entire input batch to the model.

For my use case, this is highly desirable to increase the throughput of the model.

How would this feature be used? Please describe.

Provide batch_transform_fn functions. If a user wants to customize the default batch logic, they can provide functions batch_input_fn, batch_predict_fn and batch_output_fn where they are given the entire batch of requests as input.

Describe alternatives you've considered

I haven't found an alternative to achieve this functionality using the sagemaker-pytorch-inference-toolkit, so I'm writing a custom Dockerfile that just uses torchserve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant