Can't work well with pytorch online trainning when set num_worker > 0 #37

MartinMML · 2022-07-25T01:58:24Z

Hello,
I met a problem during using the gpuRIR with Pytorch online training when I set the num_worker of dataloader bigger than 0.
The Error info is: GPUassert: initialization error gpuRIR_cuda.cu 793.
But it works well with num_worker=0.
Is this a known problem and do you have any good suggestions?
Thanks a lot.

DavidDiazGuerra · 2022-07-25T08:07:45Z

Hello Martin,

I know gpuRIR doesn't work when you try to run it in parallel threads using num_worker > 0 in PyTorch dataloaders. I have never worked on that but I know that PyTorch generally doesn't recommend doing CUDA works in the parallel dataloaders: https://pytorch.org/docs/stable/data.html

It seems like the recommendations in the PyTorch documentation are about returning GPU variables so it shouldn't affect gpuRIR, which performs some CUDA works but then move the result to the CPU before returning it, so maybe there are some issues about how I'm initializing some CUDA stuff on gpuRIR that make it crash when using it in multithreaded programs. However, I don't know too much about this topic and I don't have time to dig deeper into this right now, so I'm afraid I can't offer more help about it.

Let me know if you find something more about the topic. I'll leave the issue open just in case someone else can see it and offer some help.

Best regards,
David

MartinMML · 2022-07-25T10:21:59Z

okay, I got it. Thanks a lot.

acappemin · 2023-09-21T07:25:37Z

Add the following line at the beginning of your code will help.
multiprocessing.set_start_method('forkserver')

DavidDiazGuerra · 2023-09-25T07:42:17Z

Thanks for the tip! I'll try to check this up when I have some time.

DavidDiazGuerra added the help wanted Extra attention is needed label Jul 25, 2022

DavidDiazGuerra mentioned this issue Mar 13, 2024

Can gpurir run with multiprocessing? #58

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't work well with pytorch online trainning when set num_worker > 0 #37

Can't work well with pytorch online trainning when set num_worker > 0 #37

MartinMML commented Jul 25, 2022

DavidDiazGuerra commented Jul 25, 2022

MartinMML commented Jul 25, 2022

acappemin commented Sep 21, 2023

DavidDiazGuerra commented Sep 25, 2023

Can't work well with pytorch online trainning when set num_worker > 0 #37

Can't work well with pytorch online trainning when set num_worker > 0 #37

Comments

MartinMML commented Jul 25, 2022

DavidDiazGuerra commented Jul 25, 2022

MartinMML commented Jul 25, 2022

acappemin commented Sep 21, 2023

DavidDiazGuerra commented Sep 25, 2023