Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use 'broadcast_address' instead of 'rpc_address' to contact local peer #123

Open
viossat opened this issue Nov 16, 2018 · 17 comments
Open

Comments

@viossat
Copy link

viossat commented Nov 16, 2018

I think broadcast_address is more suitable as rpc_address is usually set to 0.0.0.0 or a local sub-network IP.

@thibaultcha
Copy link
Owner

thibaultcha commented Dec 21, 2018

We use rpc_address which is for client connections. In comparison, broadcast_address is used for inter-nodes communication. This driver aims at a similar behavior as other Datastax-maintained drivers, which use rpc_address as well.

@nmeylan
Copy link

nmeylan commented Jan 2, 2019

Hello,
I have an issue that seems related to this one: since I have upgraded to 0.14.1, kong does not start, I got the following error: [cassandra error] could not retrieve current migrations: [cassandra error] all hosts tried for query failed. 172.17.0.4: host seems unhealthy, considering it down (timeout). I only have a single cassandra node at the moment.

Where 172.17.0.4 is the internal IP of my cassandra container. So my kong container can't reach cassandra.

In addition, kong migration up works, certainly because cassandra driver is used in a different way for migration. (maybe this will help you).

@thibaultcha
Copy link
Owner

@nmeylan What is the value configured for rpc_address in your cluster's nodes?

@nmeylan
Copy link

nmeylan commented Jan 3, 2019

rpc_address is set at 0.0.0.0.

I tried with kong 1.0, and it "works" as now I go into this https://github.com/thibaultcha/lua-cassandra/blob/master/lib/resty/cassandra/cluster.lua#L501
However, I still don't understand why log level warn is used when I fall into this condition. It does not seems to be wrong to use 0.0.0.0 as rpc_address, Per Datastax documentation:

rpc_address:

  • 0.0.0.0:
    Listens on all configured interfaces. You must set the broadcast_rpc_address to a value other than 0.0.0.0

BTW, thanks for your answer

@viossat
Copy link
Author

viossat commented Jan 3, 2019

My case was about Cassandra in containers as well. rpc_address is set to 0.0.0.0. The Lua driver fetches the local Docker IP of the local peer (coordinator) which is a private and usually unreachable IP.
The JS driver doesn't follow this behavior.

@viossat
Copy link
Author

viossat commented Jan 3, 2019

Please note that changed in 1.3.3 c176a9c

@thibaultcha
Copy link
Owner

The JS driver doesn't follow this behavior.

How so? The JS driver uses rpc_address unless it is specified as the bind all address (0.0.0.0), in which case it will fallback to peer (listen_address) (source: https://github.com/datastax/nodejs-driver/blob/580312937e07368d328f0a9aae2029609b307cdd/lib/control-connection.js#L740-L753), just like we do in lua-cassandra (source: c176a9c#diff-745de875635774250552720721ac4fd5R524). The default address translator in the JS driver does not do anything to the received address: https://github.com/datastax/nodejs-driver/blob/707953f52709fc507f07edb18659f2e2d9f4d22e/lib/policies/address-resolution.js#L47.

@viossat How are you connecting your containers (Kong <-> Cassandra) together? Are you linking them, or using Docker a network?

@viossat
Copy link
Author

viossat commented Jan 3, 2019

getAddressForPeerHost is used to get the addresses of the remote peers only, the JS driver uses the contact point as the local peer (coordinator) IP. It uses the system.peers table to find extra contact points.
https://github.com/datastax/nodejs-driver/blob/580312937e07368d328f0a9aae2029609b307cdd/lib/control-connection.js#L120-L154

My Cassandra instances are located in multiple datacenters and sub-networks, same for the clients.

@thibaultcha
Copy link
Owner

@viossat So, if I understand you correctly, your complaint is that the Lua driver overrides the specified contact_point argument with said contact point's rpc_address, instead of sticking to the given contact_point argument? (source: https://github.com/thibaultcha/lua-cassandra/blob/master/lib/resty/cassandra/cluster.lua#L494-L502). I was under the impression that your complaint was related to how the driver contacted both the local contact point and the peers.

Help me understand: why would only your contact point be reachable by the the driver (or Kong, for that is?). Since the driver load balances across all peers (including the contact point), it treats both the local contact point and the peers the same way, and assumes that all of them can be contacted by their rpc_address (falling back to the listen_address if the former is set to "bind all"). Can you clarify this point for me?

@viossat
Copy link
Author

viossat commented Jan 4, 2019

Sorry, my issue was not clear and didn't take 1.3.3 changes into account.

The JS driver from DataStax works as follow:

  1. resolve all contact points domains and add them (only IPs) in the initial peer list (without using any system.local.*_address)
  2. use system.peers to discover more peers

With the Lua driver:

  • domain contact points are not resolved manually, so extra A/AAAA DNS records are ignored
  • only the first responding contact point is added to the initial peer list
  • (before 1.3.3) system.local.listen_address is used as IP of the first responding contact point, that was my original issue
  • (since 1.3.3) system.local.rpc_address is used as IP of the first responding contact point, raises a warning and fallbacks to the contact point host if equals to 0.0.0.0
  • (since 1.3.3) when it fallbacks to the contact point host, that may be a domain with multiple A/AAAA records which may lead to another peer from time to time

When working with Cassandra in container:

  • system.local.rpc_address is equals to 0.0.0.0 (so the warning is always raised and a domain may be used as host)
  • listen_address of the Cassandra instances are usually unreachable from the client (no effect since 1.3.3)

(before 1.3.3) While most of Cassandra clusters have more than one node, someone may not notice the issue as the first responding contact point returns the others peers before its (unreachable) listen_address is saved in the peer list.

@thibaultcha
Copy link
Owner

Reopening this since the discussion is active.

@thibaultcha thibaultcha reopened this Jan 5, 2019
@thibaultcha
Copy link
Owner

domain contact points are not resolved manually, so extra A/AAAA DNS records are ignored

This library does not embed a DNS resolver, so contact points must be specified as IP addresses. Kong uses a clever hack that makes it possible to pass contact points as hostnames.

raises a warning and fallbacks to the contact point host if equals to 0.0.0.0

Similarly to the JS driver, iirc.

when it fallbacks to the contact point host, that may be a domain with multiple A/AAAA records which may lead to another peer from time to time

Since we only refer to contact points by IP address (from rpc_address or listen_address, as you pointed out, that shouldn't be the case? This is especially critical for the reasons that initially triggered these recent changes, described in great details in this commit (which was wrongly using listen_address, as fixed in a subsequent commit).

so the warning is always raised and a domain may be used as host

I have considered making this warning silent when opts.silent is enabled. A PR here is welcome.

listen_address of the Cassandra instances are usually unreachable from the client

Which property do you suggest we fallback to then?

@viossat
Copy link
Author

viossat commented Jan 6, 2019

1.3.3 works in my case. imo the saved IP of the local peer should always be the contact point, not rpc_address when available.
An DNS resolver would be awesome.

@rainest
Copy link

rainest commented Feb 11, 2019

Absent DNS capabilities in the driver, broadcast_address is probably the best option. That appears to be what the DataStax drivers effectively do, as the listen address they refer to (the peer column from system.peers) corresponds to broadcast_address in system.local.

rpc_address in system.peers appears to actually be broadcast_rpc_address as of Cassandra 2.1--it looks like the DataStax driver's fallback mechanism was designed for and only applies to older versions, as broadcast_rpc_address is never 0.0.0.0. While it's the value we actually want, it doesn't appear to be exposed anywhere for the local node (system.peers only includes other nodes' info). broadcast_address is the next best guess even though it won't be correct if you use separate addresses for CQL and gossip listens (hopefully uncommon).

https://github.com/thibaultcha/lua-cassandra/blob/1.3.4/lib/resty/cassandra/cluster.lua#L477 will retrieve the coordinator hostname, so it can wind up in the state described in 8b9fcd9#diff-745de875635774250552720721ac4fd5L496

@thibaultcha
Copy link
Owner

@rainest Would you be willing to send a PR to this repository? In the same vein, we could also PR Kong to resolve its provide contact points before instantiating the cluster instance.

@thibaultcha
Copy link
Owner

On second thought, I wonder if we shouldn't just follow my second suggestion, but keep this driver as-is for now (so as to stay aligned to the Datastax implementations)?

@thibaultcha
Copy link
Owner

Here is the Kong PR: Kong/kong#4296

Note that this fix is only relevant in versions of Kong that create several resty.cassandra.cluster instances (as described in the aforementioned commit), so that includes Kong up to 0.14, and Kong Enterprise up to 0.34 (today's latest release).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants