Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One worker on a Node being removed while others are working properly #3090

Open
bamedro opened this issue Jan 16, 2018 · 0 comments
Open

One worker on a Node being removed while others are working properly #3090

bamedro opened this issue Jan 16, 2018 · 0 comments
Labels

Comments

@bamedro
Copy link
Contributor

bamedro commented Jan 16, 2018

Deploying 19,840 nodes over 640 hosts, 32 nodes per host, 1 single node failed when registering with the error below.
In particular, other nodes from the same host was working properly.

On Server side:

[2018-01-15 19:41:26,235 thread-290 WARN                o.o.p.r.c.RMCore] Cannot set node as available, the node is unknown: pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad
-ec15634e4688_12
[2018-01-15 19:41:26,620 38/RM_NODE WARN    o.o.p.r.n.RMNodeConfigurator] Cannot properly configure the node pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12
 because of an error during configuration phase
java.io.IOException: remote object pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 not found. Message method=getLocalNodeProperty, sender=null, sequenceNumb
er=0 cannot be processed
        at org.objectweb.proactive.extensions.pnp.PNPROMessageRequest.processMessage(PNPROMessageRequest.java:88)
        at org.objectweb.proactive.extensions.pnp.PNPServerHandler$RequestExecutor.run(PNPServerHandler.java:296)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
[2018-01-15 19:41:26,709 38/RM_NODE INFO                o.o.p.r.c.RMCore] The node pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 provided by "admin" (pnp:
//172.16.2.36:49558/HalfbodiesNode_748472782/HalfBody_pa.stub.org.ow2.proactive.resourcemanager.nodesource.dataspace._StubDataSpaceNodeConfigurationAgent#configureNode_52009) is down

On Node Side:

Jan 15 19:41:19 debian java[1411]: Adding node ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 to Resource Manager.
Jan 15 19:41:19 debian java[1411]: Adding node ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 to Resource Manager.
Jan 15 19:41:26 debian java[1411]: Node pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 added.
Jan 15 19:41:26 debian java[1411]: Node pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 added.
Jan 15 19:41:26 debian java[1411]: Connected to the resource manager at pnp://172.16.2.115:64738/
Jan 15 19:41:26 debian java[1411]: Connected to the resource manager at pnp://172.16.2.115:64738/
Jan 15 19:41:26 debian java[1411]: Node pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 has been removed
Jan 15 19:41:26 debian java[1411]: Node pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 has been removed

Note that:
Node pnp://172.16.2.36:49558/ns2033-8ea6dfd9-59c8-4eaf-a4ad-ec15634e4688_12 has been removed
appears after
Connected to the resource manager at pnp://172.16.2.115:64738/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant