New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some network TLC, including discarding multicast discovery on the "wrong" interface #1943
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! A few remarks and questions below.
src/core/ddsi/src/ddsi_endpoint.c
Outdated
for (const struct ddsi_networkpartition_address *a = np->uc_addresses; a; a = a->next) | ||
{ | ||
for (int i = 0; i < gv->n_interfaces; i++) | ||
if (memcmp (a->loc.address, gv->interfaces[i].loc.address, sizeof (a->loc.address)) == 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to check for equal kind
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not likely to ever cause a problem if we don't, but I think we should :) See b336a27
src/core/ddsi/src/ddsi_init.c
Outdated
GVTRACE ("some interfaces disallow spdp multicast: defaulting participant index to \"auto\"\n"); | ||
gv->config.participantIndex = DDSI_PARTICIPANT_INDEX_AUTO; | ||
} | ||
if (none_allow_spdp_mc && !gv->config.config_has_empty_peers_section) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trace log below includes "no peers configured": should the !
be removed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it a bit and decided that it would be better to have less implicit behaviour: 65a8843 adds an attribute to the "Peers" element, which I believe makes it both more flexible and clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this way it's clear what the setting does and what's the reason for adding localhost as peer.
@@ -1870,8 +1870,8 @@ static struct cfgelem discovery_cfgelems[] = { | |||
"<ul><li><i>auto</i>: which will attempt to automatically determine an " | |||
"available participant index " | |||
"(see also Discovery/MaxAutoParticipantIndex), or</li>\n" | |||
"<li>a non-negative integer less than 120, or</li>\n" | |||
"<li><i>none</i>:, which causes it to use arbitrary port numbers for " | |||
"<li>a non-negative integer, or</li>\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect that the limitation in src/core/ddsi/src/ddsi_config.c:1572
would also be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in b336a27
{ | ||
struct ddsi_config_peer_listelem peer_local; | ||
char local_addr[DDSI_LOCSTRLEN]; | ||
ddsi_locator_to_string_no_port (local_addr, sizeof (local_addr), &gv->interfaces[0].loc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the exact reason for adding 'self' to the peers list? And is interfaces[0]
's address always the right one to add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the changes in 65a8843 take care of the first part.
The second question: yes, I believe interfaces[0]
is ok. There is currently always at least one real network interface and by construction that one is in interfaces[0]
. It think it is a reasonable assumption that it can always be used to send data to itself. If that's not true I think we might be looking at essentially another version of the (new) check to see if loopback multicast works on linux.
src/core/ddsi/src/ddsi_udp.c
Outdated
// on which the packet was received. If it doesn't work, we don't mind: it simply | ||
// means there is slightly less information available for making sense of addresses | ||
// or deciding whether multicast SPDP packet really is to be processed | ||
(void) setsockopt_pktinfo (gv, sock, ipv6); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the performance impact be of setting this option? And maybe only set it on the socket used for SPDP, and not on the data socket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a bit of checking using ddsperf -TOU
on my mac, an ancient windows 10 box and an RPi3 in 64-bit mode with a Linux 5.1. This yields:
- rpi: 74.24 → 71.57 — -3.6%
- mac: 3429.39 → 3350.57 — -2.3%
- windows: 1.34 → 1.36 — measurement noise
(Windows is based on the output of pub
rather than sub
and has a different scale.)
I'm tempted to say the benefits outweigh the cost, but I am considering adding an option to disable it altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree, an option to disable it would be nice. The default can be 'enabled' because of the limited impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This extends `ddsrt_recvmsg` to not only return the packet's source address, but also its destination address and the interface on which it was received. If the operating system does not provide this information, they can be set to "unknown". The additional information is valuable because, at least on Linux, one sometimes receives multicast packets from an interface on which one did not join the multicast group for reasons I am yet to discover. This allows detecting such cases. It also makes it possible to make a more informed guess about which interface is the most suitable choice for communicating with a node X, and it also seems likely that receiving a multicast from X is a good indicator that sending a multicast to node X will work. This commit does not use the information other than for writing it in the trace. Current implementation is verified to work on macOS, Linux and Windows for IPv4 and IPv6. Signed-off-by: Erik Boasson <eb@ilities.com>
Signed-off-by: Erik Boasson <eb@ilities.com>
Before this commit, the decision which unicast locators to include was to accept all distant locators if they were routable addresses and we had a network interface that looked like it could handle them. The "DontRoute" configuration option would change this and made it ignore all distant locators. The first turns out to be problematic in practice because quite often there are no suitable routes defined in the networking stack, but defaulting to not allowing any routing is also problematic. What this commit does is twofold: * Firstly, it uses the new received packet information to associate the address with the network interface over which it was received (if possible) instead of the first seemingly suitable one; * Secondly, it only uses distant addresses when it has been established that no nearby addresses are in the advertised set. This heuristic probably better fits the use cases. What it perhaps should also do is reject the address if we know what interface it arrived on and that interface is not enabled in the configuration. Signed-off-by: Erik Boasson <eb@ilities.com>
For IPv4 on Linux, the common case seems to be that the loopback interface is perfectly capable of handling multicast but the interface usually don't indicate this. This commit adds a run-time test by sending a small multicast packet to itself in this specific case. For IPv6 there is some other complication, the test says the network is unreachable. Perhaps correctly, perhaps there's a trick I haven't found yet. Either way, if errs on the safe side, so that's fine. Signed-off-by: Erik Boasson <eb@ilities.com>
There is no simple hard limit (other than 2^32-2). Signed-off-by: Erik Boasson <eb@ilities.com>
This adds an "allow_multicast" attribute to the network interface configuration and uses the old General/AllowMulticast as the default value. This way, both backwards compatibility and easy global configuration are supported. This incidentally addresses a slew of bugs in multi-network configurations, but it does make things rather more complicated. Signed-off-by: Erik Boasson <eb@ilities.com>
Ignore SPDP messages that: * Provably arrived over a network interface we didn't enable * Were multicast over a network interface on which did not allow multicast discovery. For an example why this is useful: on Linux, running two proceses one allowing multicast over loopback but not Ethernet, and one allowing it over Ethernet but not over loopback, both processes do receive the multicast message despite not having joined the multicast group on the interface it is sent on. With this commit, those SPDP messages are ignored and the two processes do not discover each other. The direct reason for this change is to make it possible to have discovery over Ethernet only with unicast messages, while using relying on multicast over loopback. This turns out to be a rather important set-up, and is even the new default in ROS 2. Signed-off-by: Erik Boasson <eb@ilities.com>
The direct reason for the changes in this PR is to make it possible to have discovery over Ethernet only with unicast messages, while using relying on multicast over loopback. This turns out to be a rather important set-up, and is even the new default in ROS 2.
Naturally this requires being able to configure multicast usage per-interface.
I have not been able to find a better way than to extract interface and destination address information from incoming packets. The good news is that I figured out how to get that information on all of Linux, macOS and Windows. I'm sure I'll find other uses for this information, too 🙂
It furthermore changes the locator selection logic to ignore addresses that do not map to local interfaces if some addresses do map to local interfaces. This should obviate the need for having to set
DontRoute
in many instances where multiple interfaces were used simultaneously.As a goody, it tries to work around Linux quirky behaviour with IPv4 multicast on the loopback interface: it (seemingly) always works, but the kernel doesn't advertise it. With this version, on startup Cyclone will quickly check whether it works so that one doesn't have to use weird options, make do with unicast or have root privileges to change the interface flags.