join announcement is delayed #20

BurntSushi · 2013-06-20T03:51:09Z

Upon joining a cluster, I noticed that the actual announcement is significantly delayed. In particular, the debug output shows that a node has joined a full 20 seconds before the OnNodeJoin or OnNewLeaves events are fired.

I tracked this down to line 646 of cluster.go. My understanding of reading through the code and the debug output is that the join process goes like this: (please slap me if I've gone awry)

send join message to cluster
cluster accepts join message and node into cluster (debug output says "Node ... joined!")
cluster sends state tables back to the new node
the new node doesn't know it has successfully joined yet, waits for 2 * NETWORK_TIMEOUT and announces presence
after announcement is sent, the node proclaims itself joined
the OnNodeJoin and OnNewLeaves events are fired for all nodes in the cluster

I'm guessing that there is a reason why there is a delay set to 2 * NETWORK_TIMEOUT, but I'm not sure what it is. (Truthfully, my networking skills are pretty poor, so I dare not hazard a guess.)

I would be very happy to work on a fix for this problem, I'm just not sure what the fix would look like yet. Therefore, I am seeking guidance. :-)

My inclination is to try and announce the node's presence immediately, and if it fails, try again after a longer timeout. I just don't know what if it fails means in this context.

Thanks!

The text was updated successfully, but these errors were encountered:

paddycarver · 2013-06-23T00:30:15Z

This is not a bug, but is intended behaviour. It's not the optimal behaviour, I'll agree.

The reason the timeout is set to 2 * NETWORK_TIMEOUT is to ensure that Nodes have the chance to respond with a race condition. Waiting NETWORK_TIMEOUT allows us to be sure that if a message is going to be sent to a Node, it has been sent. Waiting another NETWORK_TIMEOUT allows us to be sure that if a race condition is going to be detected, it has been detected. So waiting 2 * NETWORK_TIMEOUT allows us to make sure that nobody is going to throw a race condition warning after we say we've joined the cluster.

The reason this is important is because suppose we have a cluster with Nodes whose IDs are 1, 4, and 5. Suppose Nodes 2 and 3 join at roughly the same time (e.g., they are both doing the join dance at the same time). Should 5 attempt to route a message with ID 3 while this is happening, it's possible that a race condition could lead to Node 2 to believe it is the closest Node, when in fact 3 is. To counteract this, we wait until a Node has a full representation of its state tables to announce its presence, which is a signal that it's ready to begin handling messages.

For some clusters, this isn't that big an issue. Messages being mis-delivered for a few seconds could be a non-issue entirely. For some clusters, that consistency guarantee--that a Node will never consider a message delivered unless it's sure that it is the closest Node--is really important. To accommodate both situations, the timeout should probably be controllable via a standalone configuration value that just defaults to the stronger consistency guarantee.

Any thoughts on this? I appreciate the feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

join announcement is delayed #20

join announcement is delayed #20

BurntSushi commented Jun 20, 2013

paddycarver commented Jun 23, 2013

join announcement is delayed #20

join announcement is delayed #20

Comments

BurntSushi commented Jun 20, 2013

paddycarver commented Jun 23, 2013