[RFE] HandleGenerate equivalent for sagemaker_server.cc #7151

billcai · 2024-04-24T04:20:46Z

Is your feature request related to a problem? Please describe.
At present text generation is only supported for http_server.cc and not supported in sagemaker_server.cc. This was verified using vLLM backend and Triton server. http_server.cc supports this by implementing HandleGenerate, which allows for the use of decoupled models (which vLLM backend models are).

Describe the solution you'd like
Implement the equivalent of HandleGenerate for sagemaker_server.cc

Describe alternatives you've considered
Using alternative servers (like DJLServing) with vLLM/TensorRT-LLM or different stacks (e.g. HuggingFace TGI)

Elaborating on this further:
Certain backends (e.g. vLLM) currently runs only in decoupled model transaction policy. sagemaker_server.cc inference function checks and fails any call for models that runs with decoupled model transaction policy.

http_server.cc on the other hand has a few functions for inference. HandleInfer does the same check for decoupled model transaction policy, and fails if the models runs with decoupled model transaction policy. HandleGenerate on the other hand doesn't, and is designed for text generation purposes. Hence, seeking advice/assistance to implement HandleGenerate equivalent for sagemaker_service.cc.

The text was updated successfully, but these errors were encountered:

rmccorm4 · 2024-05-02T22:38:16Z

Hi @billcai, thanks for raising this request! CC @nskool

nnshah1 self-assigned this Apr 26, 2024

nnshah1 added the enhancement New feature or request label Apr 26, 2024

rmccorm4 changed the title ~~HandleGenerate equivalent for sagemaker_server.cc~~ [RFE] HandleGenerate equivalent for sagemaker_server.cc May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFE] HandleGenerate equivalent for sagemaker_server.cc #7151

[RFE] HandleGenerate equivalent for sagemaker_server.cc #7151

billcai commented Apr 24, 2024 •

edited

rmccorm4 commented May 2, 2024

[RFE] HandleGenerate equivalent for sagemaker_server.cc #7151

[RFE] HandleGenerate equivalent for sagemaker_server.cc #7151

Comments

billcai commented Apr 24, 2024 • edited

rmccorm4 commented May 2, 2024

billcai commented Apr 24, 2024 •

edited