Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] HandleGenerate equivalent for sagemaker_server.cc #7151

Open
billcai opened this issue Apr 24, 2024 · 1 comment
Open

[RFE] HandleGenerate equivalent for sagemaker_server.cc #7151

billcai opened this issue Apr 24, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@billcai
Copy link

billcai commented Apr 24, 2024

Is your feature request related to a problem? Please describe.
At present text generation is only supported for http_server.cc and not supported in sagemaker_server.cc. This was verified using vLLM backend and Triton server. http_server.cc supports this by implementing HandleGenerate, which allows for the use of decoupled models (which vLLM backend models are).

Describe the solution you'd like
Implement the equivalent of HandleGenerate for sagemaker_server.cc

Describe alternatives you've considered
Using alternative servers (like DJLServing) with vLLM/TensorRT-LLM or different stacks (e.g. HuggingFace TGI)

Elaborating on this further:
Certain backends (e.g. vLLM) currently runs only in decoupled model transaction policy. sagemaker_server.cc inference function checks and fails any call for models that runs with decoupled model transaction policy.

http_server.cc on the other hand has a few functions for inference. HandleInfer does the same check for decoupled model transaction policy, and fails if the models runs with decoupled model transaction policy. HandleGenerate on the other hand doesn't, and is designed for text generation purposes. Hence, seeking advice/assistance to implement HandleGenerate equivalent for sagemaker_service.cc.

@nnshah1 nnshah1 self-assigned this Apr 26, 2024
@nnshah1 nnshah1 added the enhancement New feature or request label Apr 26, 2024
@rmccorm4
Copy link
Collaborator

rmccorm4 commented May 2, 2024

Hi @billcai, thanks for raising this request! CC @nskool

@rmccorm4 rmccorm4 changed the title HandleGenerate equivalent for sagemaker_server.cc [RFE] HandleGenerate equivalent for sagemaker_server.cc May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

3 participants