Use 'broadcast_address' instead of 'rpc_address' to contact local peer #123

viossat · 2018-11-16T22:04:48Z

I think broadcast_address is more suitable as rpc_address is usually set to 0.0.0.0 or a local sub-network IP.

The text was updated successfully, but these errors were encountered:

thibaultcha · 2018-12-21T21:20:38Z

We use rpc_address which is for client connections. In comparison, broadcast_address is used for inter-nodes communication. This driver aims at a similar behavior as other Datastax-maintained drivers, which use rpc_address as well.

nmeylan · 2019-01-02T15:20:49Z

Hello,
I have an issue that seems related to this one: since I have upgraded to 0.14.1, kong does not start, I got the following error: [cassandra error] could not retrieve current migrations: [cassandra error] all hosts tried for query failed. 172.17.0.4: host seems unhealthy, considering it down (timeout). I only have a single cassandra node at the moment.

Where 172.17.0.4 is the internal IP of my cassandra container. So my kong container can't reach cassandra.

In addition, kong migration up works, certainly because cassandra driver is used in a different way for migration. (maybe this will help you).

thibaultcha · 2019-01-03T06:29:53Z

@nmeylan What is the value configured for rpc_address in your cluster's nodes?

nmeylan · 2019-01-03T07:55:37Z

rpc_address is set at 0.0.0.0.

I tried with kong 1.0, and it "works" as now I go into this https://github.com/thibaultcha/lua-cassandra/blob/master/lib/resty/cassandra/cluster.lua#L501
However, I still don't understand why log level warn is used when I fall into this condition. It does not seems to be wrong to use 0.0.0.0 as rpc_address, Per Datastax documentation:

rpc_address:

0.0.0.0:
Listens on all configured interfaces. You must set the broadcast_rpc_address to a value other than 0.0.0.0

BTW, thanks for your answer

viossat · 2019-01-03T16:15:52Z

My case was about Cassandra in containers as well. rpc_address is set to 0.0.0.0. The Lua driver fetches the local Docker IP of the local peer (coordinator) which is a private and usually unreachable IP.
The JS driver doesn't follow this behavior.

viossat · 2019-01-03T16:18:22Z

Please note that changed in 1.3.3 c176a9c

thibaultcha · 2019-01-03T20:37:20Z

The JS driver doesn't follow this behavior.

How so? The JS driver uses rpc_address unless it is specified as the bind all address (0.0.0.0), in which case it will fallback to peer (listen_address) (source: https://github.com/datastax/nodejs-driver/blob/580312937e07368d328f0a9aae2029609b307cdd/lib/control-connection.js#L740-L753), just like we do in lua-cassandra (source: c176a9c#diff-745de875635774250552720721ac4fd5R524). The default address translator in the JS driver does not do anything to the received address: https://github.com/datastax/nodejs-driver/blob/707953f52709fc507f07edb18659f2e2d9f4d22e/lib/policies/address-resolution.js#L47.

@viossat How are you connecting your containers (Kong <-> Cassandra) together? Are you linking them, or using Docker a network?

viossat · 2019-01-03T21:09:27Z

getAddressForPeerHost is used to get the addresses of the remote peers only, the JS driver uses the contact point as the local peer (coordinator) IP. It uses the system.peers table to find extra contact points.
https://github.com/datastax/nodejs-driver/blob/580312937e07368d328f0a9aae2029609b307cdd/lib/control-connection.js#L120-L154

My Cassandra instances are located in multiple datacenters and sub-networks, same for the clients.

thibaultcha · 2019-01-03T22:15:57Z

@viossat So, if I understand you correctly, your complaint is that the Lua driver overrides the specified contact_point argument with said contact point's rpc_address, instead of sticking to the given contact_point argument? (source: https://github.com/thibaultcha/lua-cassandra/blob/master/lib/resty/cassandra/cluster.lua#L494-L502). I was under the impression that your complaint was related to how the driver contacted both the local contact point and the peers.

Help me understand: why would only your contact point be reachable by the the driver (or Kong, for that is?). Since the driver load balances across all peers (including the contact point), it treats both the local contact point and the peers the same way, and assumes that all of them can be contacted by their rpc_address (falling back to the listen_address if the former is set to "bind all"). Can you clarify this point for me?

viossat · 2019-01-04T15:07:17Z

Sorry, my issue was not clear and didn't take 1.3.3 changes into account.

The JS driver from DataStax works as follow:

resolve all contact points domains and add them (only IPs) in the initial peer list (without using any system.local.*_address)
use system.peers to discover more peers

With the Lua driver:

domain contact points are not resolved manually, so extra A/AAAA DNS records are ignored
only the first responding contact point is added to the initial peer list
(before 1.3.3) system.local.listen_address is used as IP of the first responding contact point, that was my original issue
(since 1.3.3) system.local.rpc_address is used as IP of the first responding contact point, raises a warning and fallbacks to the contact point host if equals to 0.0.0.0
(since 1.3.3) when it fallbacks to the contact point host, that may be a domain with multiple A/AAAA records which may lead to another peer from time to time

When working with Cassandra in container:

system.local.rpc_address is equals to 0.0.0.0 (so the warning is always raised and a domain may be used as host)
listen_address of the Cassandra instances are usually unreachable from the client (no effect since 1.3.3)

(before 1.3.3) While most of Cassandra clusters have more than one node, someone may not notice the issue as the first responding contact point returns the others peers before its (unreachable) listen_address is saved in the peer list.

thibaultcha · 2019-01-05T02:55:54Z

Reopening this since the discussion is active.

thibaultcha · 2019-01-05T03:27:59Z

domain contact points are not resolved manually, so extra A/AAAA DNS records are ignored

This library does not embed a DNS resolver, so contact points must be specified as IP addresses. Kong uses a clever hack that makes it possible to pass contact points as hostnames.

raises a warning and fallbacks to the contact point host if equals to 0.0.0.0

Similarly to the JS driver, iirc.

when it fallbacks to the contact point host, that may be a domain with multiple A/AAAA records which may lead to another peer from time to time

Since we only refer to contact points by IP address (from rpc_address or listen_address, as you pointed out, that shouldn't be the case? This is especially critical for the reasons that initially triggered these recent changes, described in great details in this commit (which was wrongly using listen_address, as fixed in a subsequent commit).

so the warning is always raised and a domain may be used as host

I have considered making this warning silent when opts.silent is enabled. A PR here is welcome.

listen_address of the Cassandra instances are usually unreachable from the client

Which property do you suggest we fallback to then?

viossat · 2019-01-06T22:35:54Z

1.3.3 works in my case. imo the saved IP of the local peer should always be the contact point, not rpc_address when available.
An DNS resolver would be awesome.

rainest · 2019-02-11T22:59:41Z

Absent DNS capabilities in the driver, broadcast_address is probably the best option. That appears to be what the DataStax drivers effectively do, as the listen address they refer to (the peer column from system.peers) corresponds to broadcast_address in system.local.

rpc_address in system.peers appears to actually be broadcast_rpc_address as of Cassandra 2.1--it looks like the DataStax driver's fallback mechanism was designed for and only applies to older versions, as broadcast_rpc_address is never 0.0.0.0. While it's the value we actually want, it doesn't appear to be exposed anywhere for the local node (system.peers only includes other nodes' info). broadcast_address is the next best guess even though it won't be correct if you use separate addresses for CQL and gossip listens (hopefully uncommon).

https://github.com/thibaultcha/lua-cassandra/blob/1.3.4/lib/resty/cassandra/cluster.lua#L477 will retrieve the coordinator hostname, so it can wind up in the state described in 8b9fcd9#diff-745de875635774250552720721ac4fd5L496

thibaultcha · 2019-02-12T00:16:52Z

@rainest Would you be willing to send a PR to this repository? In the same vein, we could also PR Kong to resolve its provide contact points before instantiating the cluster instance.

thibaultcha · 2019-02-12T00:17:29Z

On second thought, I wonder if we shouldn't just follow my second suggestion, but keep this driver as-is for now (so as to stay aligned to the Datastax implementations)?

thibaultcha · 2019-02-12T00:57:58Z

Here is the Kong PR: Kong/kong#4296

Note that this fix is only relevant in versions of Kong that create several resty.cassandra.cluster instances (as described in the aforementioned commit), so that includes Kong up to 0.14, and Kong Enterprise up to 0.34 (today's latest release).

thibaultcha closed this as completed Dec 21, 2018

thibaultcha reopened this Jan 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use 'broadcast_address' instead of 'rpc_address' to contact local peer #123

Use 'broadcast_address' instead of 'rpc_address' to contact local peer #123

viossat commented Nov 16, 2018

thibaultcha commented Dec 21, 2018 •

edited

nmeylan commented Jan 2, 2019

thibaultcha commented Jan 3, 2019

nmeylan commented Jan 3, 2019

viossat commented Jan 3, 2019

viossat commented Jan 3, 2019

thibaultcha commented Jan 3, 2019

viossat commented Jan 3, 2019 •

edited

thibaultcha commented Jan 3, 2019

viossat commented Jan 4, 2019

thibaultcha commented Jan 5, 2019

thibaultcha commented Jan 5, 2019

viossat commented Jan 6, 2019

rainest commented Feb 11, 2019

thibaultcha commented Feb 12, 2019

thibaultcha commented Feb 12, 2019

thibaultcha commented Feb 12, 2019

Use 'broadcast_address' instead of 'rpc_address' to contact local peer #123

Use 'broadcast_address' instead of 'rpc_address' to contact local peer #123

Comments

viossat commented Nov 16, 2018

thibaultcha commented Dec 21, 2018 • edited

nmeylan commented Jan 2, 2019

thibaultcha commented Jan 3, 2019

nmeylan commented Jan 3, 2019

viossat commented Jan 3, 2019

viossat commented Jan 3, 2019

thibaultcha commented Jan 3, 2019

viossat commented Jan 3, 2019 • edited

thibaultcha commented Jan 3, 2019

viossat commented Jan 4, 2019

thibaultcha commented Jan 5, 2019

thibaultcha commented Jan 5, 2019

viossat commented Jan 6, 2019

rainest commented Feb 11, 2019

thibaultcha commented Feb 12, 2019

thibaultcha commented Feb 12, 2019

thibaultcha commented Feb 12, 2019

thibaultcha commented Dec 21, 2018 •

edited

viossat commented Jan 3, 2019 •

edited