Skip to content
This repository has been archived by the owner on Jan 26, 2021. It is now read-only.

why workers are blocked when I start 6 process in three node, 2 process per node #143

Open
teki1981 opened this issue Mar 8, 2017 · 1 comment

Comments

@teki1981
Copy link

teki1981 commented Mar 8, 2017

11 class TestMultiversoSharedVariable:
12 def _test_sharedvar(self, row, col):
13 W = sharedvar.mv_shared(
14 value=np.zeros(
15 (row, col),
16 dtype=theano.config.floatX
17 ),
18 name='W',
19 borrow=True
20 )
21 delta = np.array(range(1, row * col + 1),
22 dtype=theano.config.floatX).reshape((row, col))
23 train_model = theano.function([], updates=[(W, W + delta)])
24 for i in xrange(10):
25 train_model()
26 train_model()
27 sharedvar.sync_all_mv_shared_vars() #sent to server
28 #mv.barrier()
29 # to get the newest value, we must sync again
30 mv.barrier()
31 sharedvar.sync_all_mv_shared_vars()
32 for j, actual in enumerate(W.get_value().reshape(-1)):
33 print "[%d] %d %d %d"%(i,j, (j + 1) * (i + 1) * 2 * mv.workers_num(), actual)
34
35 def test_sharedvar(self):
36 self._test_sharedvar(10, 10)
37
38
39 if name == 'main':
40 mv.init()
41 test_shared = TestMultiversoSharedVariable()
42 test_shared.test_sharedvar()
43 mv.shutdown()

I run this test, found When start one worker in one node, it is OK
but When start two worker in one node , all workers were blocked。
mpirun -hostfile alg_cluster.txt -npernode 1 python test_multi.py
mpirun -hostfile alg_cluster.txt -npernode 2 python test_multi.py

there are three ips in my cluster.

@teki1981 teki1981 changed the title why workers are blocked when I start 2 process in one node. why workers are blocked when I start 6 process in three node, 2 process per node Mar 8, 2017
@feiga
Copy link
Contributor

feiga commented Mar 8, 2017

It looks strange. Why you call twice train_model and sharedvar.sync in line 25 -31

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants