New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Exporting / downloading model larger, than VRAM available (trained with DeepSpeed) fails #670
Comments
No, this is happening with a "freshly" trained model, I haven't tried the "Use previous experiment weights" option, yet. I'm pretty sure it is because it tries to load the whole model into Causing this if statment to evaluate to True (it sets device to CPU), either by forking the code or by running an experiment in the background lets me download the model h2o-llmstudio/llm_studio/app_utils/sections/experiment.py Lines 1621 to 1630 in d65fffb
|
Just double checked, it "crashes" before reaching the sharding. |
Sorry, can't fully follow. So you are using a local model or a model from Huggingface to start your Experiment? And what is the next step? You can load a sharded model into multiple GPUs using Deepspeed. Training entirely on CPU is too slow, so we will not be supporting this in H2O LLM Studio. |
We need to also add a device window there then. Need to see how easy that's doable. For now the workaround is to hardcode it in code. |
I trained a 33b model with DeepSpeed on 40GB cards. Based on the traceback, the model seems to be too large to fit into one GPU. Is it possible to fall back on the CPU for cases like this?
the .pth file is ~67 GB, so obviously it won't fit in
CPUGPU (edit: GPU, obviously)The text was updated successfully, but these errors were encountered: