You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Has anyone tried using zero2 or zero3 in training? It seems this deepspeed does not work on my machine. I want to train the video with a very large resolution, which will cause an 80G GPU memory to be unable to train one batch of data. I hope that my 8 GPUs can be used together to train 2 batches or 4 batches of data.
I have done experiments using data with smaller resolutions. No matter how many GPUs are used, the batch size cannot be reduced below the number of GPUs, and the GPU memory usage is always the same after using zero2 or 3 when different number of GPUs used.
Thanks!
The text was updated successfully, but these errors were encountered:
Has anyone tried using zero2 or zero3 in training? It seems this deepspeed does not work on my machine. I want to train the video with a very large resolution, which will cause an 80G GPU memory to be unable to train one batch of data. I hope that my 8 GPUs can be used together to train 2 batches or 4 batches of data.
I have done experiments using data with smaller resolutions. No matter how many GPUs are used, the batch size cannot be reduced below the number of GPUs, and the GPU memory usage is always the same after using zero2 or 3 when different number of GPUs used.
Thanks!
The text was updated successfully, but these errors were encountered: