New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
local gpu can't run full in small sample cases #263
Comments
Hi, @lucidrains . Could you help me see what the problem is? |
I have similar question: I tried with a 2080ti 12gb but it went OOM immediately, when I reduced to 10 images it started to train at least, but very slow and did not use much cpu or gpu at all. Do we know what hardware, image numbers and batch size is needed to utilize the hardware properly? |
I can't seem to run Unet1D either on local GPU. Unet2D seems to pick up GPU properly with "Accelerate". Even though the device is set to "cuda:0" it only uses CPU after a few seconds of GPU usage. |
I found out one reason for slowness/idling: Using windows does not work if the DataLoader is configured with parallelism. It will fork new processes all the time which live only for short period. Windows seems to be a killer. It would be nice if a warning was printed to windows users. |
Removing num_workers seems to fix it: denoising-diffusion-pytorch/denoising_diffusion_pytorch/denoising_diffusion_pytorch_1d.py Line 767 in 9c9e403
|
My data is a 32*320 matrix with 32 samples and 320 dimensions. But locally using 4090, each iteration takes 20s and the cpu usage is 99% and gpu is 1%. When I increase the sample size to 1000 or 10000, 20 iteration per second, and the cpu usage is 99%, gpu is 99%. When I ran the example with n=32 and p=320 on kaggle p100, I found that it was 3 iterations per second, and the cpu usage was 99% and the gpu was at 99%.
I don't know what the problem is that the local gpu is much slower than on kaggle at n=32.
Hopefully this can be fixed, here is my code.
The text was updated successfully, but these errors were encountered: