-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model caching #3
Comments
The downside of a pull-through-proxy would be that it would require loading a self-signed cert on the model servers. I guess we could inject that cert automatically through a configmap mount. So it's not bad enough to not do it. I will see if I can do a PoC for it. |
IMHO this problem seems worth to have it's own solution than building it into lingo. |
I build out an example in the vllm helm chart that uses GKE ReadManyOnly PV so there is no need to download any model: https://github.com/substratusai/helm/tree/main/charts/vllm#mistral-7b-instruct-on-gke-autopilot-with-readmanyonly-pvc-to-store-model I'm looking to build a solution that works without too much hassle on any K8s distro. I agree that it could be beneficial outside of Lingo as well especially if we go the path of a caching HTTPS proxy. I would love to hear your feedback on caching HTTPs proxy where the model servers are configured to use the HTTPs proxy and the https proxy simply caches the models on e.g. local SSD. I felt like that's the most generic solution that we could make work across any K8s providers. |
I think a caching proxy is some low hanging fruit that can easily be added to any environment where public models are used. Private models or fine tuned models that are not accessible via http(s) do not benefit from this. Access control and retention are another topic to consider. A |
I think a private model registry for private models / fine tuned models would be a better fit. There doesn't seem to be a good open source project that works well as a standalone ML Model registry. I found MLFlow model registry but it's all baked together with ML flow: https://mlflow.org/docs/latest/model-registry.html There is HuggingFace Model registry but that's not open source either afaik. An open source model registry might be something worth investing in. The ReadOnlyMany PVC does work well as a cache for either public or private models. @alpe did you form any updated opinions about this? I'm on the fence of not doing the caching proxy and instead focusing time on a separate open source private model registry. |
LLMs can be very large in size.
Possible caching implementations:
The text was updated successfully, but these errors were encountered: