Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using no-shared = False, the process is blocked #37

Open
keithyin opened this issue Oct 21, 2017 · 10 comments
Open

When using no-shared = False, the process is blocked #37

keithyin opened this issue Oct 21, 2017 · 10 comments

Comments

@keithyin
Copy link

Hi,Today, i run the code, and found that when no-shared=False, the process will be blocked. Do you have any suggesstions to fix that?

THANKS!

@ikostrikov
Copy link
Owner

Blocking doesn't happen to me. What configuration are you using?

@keithyin
Copy link
Author

Ubuntu16.04
pytorch 0.2
I just run the downloaded source code, and modifying nothing. Blocking will happed. But if i use no-shared=True, the code can be run.
It is weird.

@wnstlr
Copy link

wnstlr commented Dec 4, 2017

Same here. Using Ubuntu 16.04, pytorch 0.2, and python3.5. Works fine on OSX though

@ShaniGam
Copy link

ShaniGam commented Dec 7, 2017

Anyone found a solution?

@ikostrikov
Copy link
Owner

Please report more information.

I tested it on ubuntu 16.04. PyTorch 0.2 and 0.3, python 3.6 and it works for me both on ubuntu and os x.

@wnstlr
Copy link

wnstlr commented Dec 7, 2017

Ubuntu 16.04, PyTorch 0.2, python 3.5
When I exit with ctrl-C I get that the process is stuck right before p.join().

^CTraceback (most recent call last):
File "main.py", line 77, in
p.join()
File "/usr/lib/python3.5/multiprocessing/process.py", line 121, in join
res = self._popen.wait(timeout)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 51, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 29, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 29, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

@ShaniGam
Copy link

ShaniGam commented Dec 8, 2017

It's the exact same problem as in:
pytorch/pytorch#2496
It's stuck on the ConvND call:
f = ConvNd(_pair(stride), _pair(padding), _pair(dilation), False, _pair(0), groups, torch.backends.cudnn.benchmark, torch.backends.cudnn.enabled) return f(input, weight, bias)

@japan4415
Copy link

I got same problem with Pytorch 0.3.
I could use this code in MacOS, but can't use in Ubuntu 16.04.

@japan4415
Copy link

I find way!!!
mp.set_start_method("spawn")
and change
F.softmax(logit)
to
F.softmax(logit,dim=1)

@mohamad-hasan-sohan-ajini

@japan4415

Thanks to share your solution, mp.set_start_method("spawn") should be added to the if __name__ == '__main__' scope according to this issue on pytorch. After that every thing works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants