Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial node connection should be influenced by node priority #25

Open
mariano opened this issue Mar 7, 2016 · 3 comments
Open

Initial node connection should be influenced by node priority #25

mariano opened this issue Mar 7, 2016 · 3 comments

Comments

@mariano
Copy link
Owner

mariano commented Mar 7, 2016

The documentation of the HELLO command states that a priority is given for each node in the cluster, and that this priority is a number where lower is better, which means a node is more available.

This can be used to influence the initial node connection.

@Revisor
Copy link

Revisor commented Mar 7, 2016

Just to recap, the first connection right now occurs as follows:

  1. Disque\Connection\Manager::connect() calls
  2. Manager::findAvailableConnection(), which shuffles all known nodes randomly and tries to connect to the first working one by calling
  3. Disque\Connection\Node\Node::connect(), which connects to the node, authenticates with a password, if any, and says HELLO in
  4. Node::sayHello()
  5. Back in the Manager::connect(), we call
  6. Manager::switchToNode(), which calls
  7. Manager::revealClusterFromHello(). This is the point where we know the cluster for the first time.

In order to switch to the best node at this moment, I would call Manager::switchNodeIfNeeded() at the end of Manager::connect(). If there is a better node according to the currently set node prioritizer strategy, the connection will switch to the best node.

I see a few points that need consideration:

  • In a short lived program, like a typical PHP request, this could force the request to wait for two connections in Disque, if there's a worse node among the initial batch, even though the node might be quite fine (Can there be a node that is "quite fine"? Disque uses only three priorities right now, 1 - OK, 10 - Possible failure and 100 - Failure).
  • On the other hand if we the client connects to a failing node, this might also cause problems, eg. with ADDJOB.
  • If we don't do the initial switch, the connection will switch automatically after the first GETJOB command anyway (more pertinent to long running PHP programs that read many jobs)

In the light of this, I suggest the following:

  • Connect to the first, random node
  • Reveal the cluster from the HELLO response
  • Only switch if the current node is failing (its priority is higher than 10).
  • If Disque implements priorities between 1 - 9 in the future, don't switch if the current node has a priority lower than 10, that is it is working but may be slower or something.

This could be done by writing a new node prioritizer strategy and using it for the first check, called maybe HealthyNodePrioritizer.

What do you think?

@mariano
Copy link
Owner Author

mariano commented Mar 7, 2016

I like your suggestion, goes in line with connect asap, but if Disque is telling us the node is no good (maybe it's leaving the cluster), then and only then switch away.

While we modify the connection code we might as well end up adding support for #28 so when a specific command tells us that the node we sent that command to is actually leaving the cluster, then issue a reconnect to another node (which in our case, if we implement priorities as you mention, would simply mean just triggering a reconnect)

@Revisor
Copy link

Revisor commented Mar 7, 2016

I think it is not possible to connect to a leaving node. If the connection, authentication, or HELLO return an error (and thus throw an exception), the node is skipped and we try to connect to the next one.

The initial connection logic is in Disque\Connection\Manager::findAvailableConnection()
The switch connection logic is in Disque\Connection\Manager::switchNodeIfNeeded()

Both use a similar pattern:

foreach nodes as node
    try
        connect to node
    catch
        continue

We however need to support -LEAVING for other situations like you write in #28, that's true.

@mariano mariano mentioned this issue Mar 7, 2017
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants