New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] HuggingFace export does not preserve bfloat16 weights but converts to float16 silently when using CPU for upload #702
Comments
Could you please share a config to reproduce the issue on the default dataset? A known limitation is the upload using CPU. That is automatically converted to float16, as pytorch bfloat16 isn't usually supported on CPU. |
Ah that's it exactly then, I've been using CPU to upload. Will try using GPU. |
Thanks, I'll change the topic of the issue to reflect that the conversion is done silently. |
Actually @pascal-pfeiffer I've found that unfortunately I don't have enough GPU memory on any single GPU on an 8XA100 80GB cluster to push Llama-3 70B to HF using bfloat16. I get the following OOM error. Any ideas of a workaround or way this could be done multi-GPU? INFO: 127.0.0.1:56582 - "POST / HTTP/1.1" 200 OK |
Right, for very large models that don't fit on a single GPU, we added a workaround that loads the full weights to CPU first and then shards across your GPUs before uploading. Can you try uploading the weights with |
Ah I didn't realize that's what |
Yes, |
Confirmed I can export to HF with bfloat16 when using the |
馃悰 Bug
Native bfloat16 model fine-tuned with bfloat16 gets pushed to HuggingFace as float16
To Reproduce
The text was updated successfully, but these errors were encountered: