Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we include commonly used data pre-processing library in triton server docker image? #7107

Open
HQ01 opened this issue Apr 12, 2024 · 2 comments
Labels
question Further information is requested

Comments

@HQ01
Copy link

HQ01 commented Apr 12, 2024

Is your feature request related to a problem? Please describe.

I find the current docker image xx.yy-py3 doesn't have commonly use data preprocessing libraries like huggingface transformers for accessing the tokenizer, for example. Missing this single missing package greatly limits our abilities to use triton-inference-server with its ensemble model feature.

In our specific usecase, pip install at runtime or using conda-pack are highly discouraged for various reasons. This is somewhat similar to #6467 and I believe might be common in many other industrial scenarios too.

Describe the solution you'd like

Given the prevalence of using triton server for NLP-related workload, would suggest including the transformers library in the pre-built docker image if possible.

Describe alternatives you've considered

There are other images like 24.03-trtllm-python-py3 that does come with transformers pre-installed, however we need to serve bert-like models and accordding to triton-inference-server/tensorrtllm_backend#368, there is no clear timeline to support this. So we have to rely on other backend (like ORT) to execute our model.

Additional context
Any thoughts / suggestions will be greatly appreciated!

@MatthieuToulemont
Copy link

MatthieuToulemont commented Apr 17, 2024

In our specific usecase, pip install at runtime

How about building your own image on top of the xx.yy-py3 ?
This way you will not run pip at runtime or require conda-pack

Given the prevalence of using triton server for NLP-related workload

In our case, we use triton for computer vision models and would not require transformers installed.

FROM nvcr.io/nvidia/tritonserver:XX.YY-py3                                                                                                                                                                                   
RUN pip install transformers --no-cache-dir

This Dockerfile will do what you need and will not require everyone having transformers installed by default ? Maybe this could work?

@Tabrizian
Copy link
Member

Unfortunately, we cannot install these libraries as it can increase the container size significantly and there are many other customers asking for different libraries to be included. If we accommodate all these requests, the container size would be much larger than it already is. Creating conda-pack environments or custom images are our only recommendation at this point. Let us know if you have any other suggestions that might help with this issue.

@Tabrizian Tabrizian added the question Further information is requested label Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Development

No branches or pull requests

3 participants