Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't work well with pytorch online trainning when set num_worker > 0 #37

Open
MartinMML opened this issue Jul 25, 2022 · 4 comments
Open
Labels
help wanted Extra attention is needed

Comments

@MartinMML
Copy link

Hello,
I met a problem during using the gpuRIR with Pytorch online training when I set the num_worker of dataloader bigger than 0.
The Error info is: GPUassert: initialization error gpuRIR_cuda.cu 793.
But it works well with num_worker=0.
Is this a known problem and do you have any good suggestions?
Thanks a lot.

@DavidDiazGuerra
Copy link
Owner

Hello Martin,

I know gpuRIR doesn't work when you try to run it in parallel threads using num_worker > 0 in PyTorch dataloaders. I have never worked on that but I know that PyTorch generally doesn't recommend doing CUDA works in the parallel dataloaders: https://pytorch.org/docs/stable/data.html

It seems like the recommendations in the PyTorch documentation are about returning GPU variables so it shouldn't affect gpuRIR, which performs some CUDA works but then move the result to the CPU before returning it, so maybe there are some issues about how I'm initializing some CUDA stuff on gpuRIR that make it crash when using it in multithreaded programs. However, I don't know too much about this topic and I don't have time to dig deeper into this right now, so I'm afraid I can't offer more help about it.

Let me know if you find something more about the topic. I'll leave the issue open just in case someone else can see it and offer some help.

Best regards,
David

@DavidDiazGuerra DavidDiazGuerra added the help wanted Extra attention is needed label Jul 25, 2022
@MartinMML
Copy link
Author

okay, I got it. Thanks a lot.

@acappemin
Copy link

Add the following line at the beginning of your code will help.
multiprocessing.set_start_method('forkserver')

@DavidDiazGuerra
Copy link
Owner

Thanks for the tip! I'll try to check this up when I have some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants