-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft][CI/Build] Optimize models tests #4874
base: main
Are you sure you want to change the base?
[Draft][CI/Build] Optimize models tests #4874
Conversation
Using eager mode doesn't seem to lead to significant improvement. It seems that the bottleneck is in downloading the models, so we should parallelize this process. |
b25b37f
to
4edfe6e
Compare
4edfe6e
to
6accb96
Compare
bf7edab
to
427fc60
Compare
Tbh it is probably better if we have a way to avoid re-downloading the models each time. Any thoughts? |
I'm not that experienced in Kubernetes but from my understanding, placing the HuggingFace cache inside a Volume should avoid the need to redownload the models when tests are run again in the same Pod. @rkooo567 is it possible on your end to force the CI to run on the same Pod so we can test whether the cache actually works in this way? |
Hmm I am not super familiar with how CI works actually (idk if we even use k8s under the hood). cc @simon-mo for thoughts.. |
f03800e
to
32c01b5
Compare
@khluu since you're involved with CI, can you help out with this? Particularly the part concerning Kubernetes. |
I believe we should download the model each time. @robertgshaw2-neuralmagic mentioned that putting them on NFS is a bit tricky because it might reaches rate limit. |
hostPath a possible workaround. |
Hmm I am against this. Imo we should test the default config for those tests (especially the test_model) |
The models tests keep getting interrupted (presumably due to running too long). This PR attempts to reduce the running time via:
hostPath
volume.hostPath
volumes have associated security risks. Is there another way foragent-stack-k8s
to use a persistent volume?This PR also adds
tqdm
to thecommon
dependencies. This is not actually a new dependency since it is currently used invllm/entrypoints/llm.py
; I just found that it is not in therequirements.txt
file when trying to usetqdm
for downloading the models.