Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response caching GPU tensors #7140

Open
rahchuenmonroe opened this issue Apr 19, 2024 · 1 comment
Open

Response caching GPU tensors #7140

rahchuenmonroe opened this issue Apr 19, 2024 · 1 comment
Labels
question Further information is requested

Comments

@rahchuenmonroe
Copy link

According to your docs, only input tensors located in CPU memory will be hashable for accessing the cache. And only responses with all output tensors located in CPU memory will be eligible for caching.

Does this mean that if a model runs on GPU, the requests will not be able to be cached since their outputs are on GPU? If that's the case, I think it would be great if we could cache tensors that are located on GPU since a lot of models running on Triton run on GPU.

@rmccorm4
Copy link
Collaborator

rmccorm4 commented Apr 30, 2024

Hi @rahchuenmonroe,

This applies to input/output tensors within Triton core, before and after the model execution in the backend. If you are communicating with Triton over the network (HTTP/GRPC), then all request and response tensors will be on CPU when going through Triton by default.

  • Using CUDA shared memory is a different story, but assumes client/server are co-located
  • Backends that execute the model on GPU will handle copying the data to/from CPU

So long story short, if you're talking to Triton over the network without using shared memory (and therefore communicating tensors over CPU), you can likely cache the responses even if they are from a model that is running on GPU. This is the large majority of use cases.

If you are using Triton in-process or using CUDA shared memory and passing Triton tensors that are already on GPU, then caching of those tensors is not currently supported.

@rmccorm4 rmccorm4 added the question Further information is requested label Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Development

No branches or pull requests

2 participants