Skip to content
This repository has been archived by the owner on Nov 30, 2019. It is now read-only.

join announcement is delayed #20

Open
BurntSushi opened this issue Jun 20, 2013 · 1 comment
Open

join announcement is delayed #20

BurntSushi opened this issue Jun 20, 2013 · 1 comment

Comments

@BurntSushi
Copy link

Upon joining a cluster, I noticed that the actual announcement is significantly delayed. In particular, the debug output shows that a node has joined a full 20 seconds before the OnNodeJoin or OnNewLeaves events are fired.

I tracked this down to line 646 of cluster.go. My understanding of reading through the code and the debug output is that the join process goes like this: (please slap me if I've gone awry)

  • send join message to cluster
  • cluster accepts join message and node into cluster (debug output says "Node ... joined!")
  • cluster sends state tables back to the new node
  • the new node doesn't know it has successfully joined yet, waits for 2 * NETWORK_TIMEOUT and announces presence
  • after announcement is sent, the node proclaims itself joined
  • the OnNodeJoin and OnNewLeaves events are fired for all nodes in the cluster

I'm guessing that there is a reason why there is a delay set to 2 * NETWORK_TIMEOUT, but I'm not sure what it is. (Truthfully, my networking skills are pretty poor, so I dare not hazard a guess.)

I would be very happy to work on a fix for this problem, I'm just not sure what the fix would look like yet. Therefore, I am seeking guidance. :-)

My inclination is to try and announce the node's presence immediately, and if it fails, try again after a longer timeout. I just don't know what if it fails means in this context.

Thanks!

@paddycarver
Copy link

This is not a bug, but is intended behaviour. It's not the optimal behaviour, I'll agree.

The reason the timeout is set to 2 * NETWORK_TIMEOUT is to ensure that Nodes have the chance to respond with a race condition. Waiting NETWORK_TIMEOUT allows us to be sure that if a message is going to be sent to a Node, it has been sent. Waiting another NETWORK_TIMEOUT allows us to be sure that if a race condition is going to be detected, it has been detected. So waiting 2 * NETWORK_TIMEOUT allows us to make sure that nobody is going to throw a race condition warning after we say we've joined the cluster.

The reason this is important is because suppose we have a cluster with Nodes whose IDs are 1, 4, and 5. Suppose Nodes 2 and 3 join at roughly the same time (e.g., they are both doing the join dance at the same time). Should 5 attempt to route a message with ID 3 while this is happening, it's possible that a race condition could lead to Node 2 to believe it is the closest Node, when in fact 3 is. To counteract this, we wait until a Node has a full representation of its state tables to announce its presence, which is a signal that it's ready to begin handling messages.

For some clusters, this isn't that big an issue. Messages being mis-delivered for a few seconds could be a non-issue entirely. For some clusters, that consistency guarantee--that a Node will never consider a message delivered unless it's sure that it is the closest Node--is really important. To accommodate both situations, the timeout should probably be controllable via a standalone configuration value that just defaults to the stronger consistency guarantee.

Any thoughts on this? I appreciate the feedback.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants