Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some network TLC, including discarding multicast discovery on the "wrong" interface #1943

Merged
merged 7 commits into from Mar 20, 2024

Conversation

eboasson
Copy link
Contributor

The direct reason for the changes in this PR is to make it possible to have discovery over Ethernet only with unicast messages, while using relying on multicast over loopback. This turns out to be a rather important set-up, and is even the new default in ROS 2.

Naturally this requires being able to configure multicast usage per-interface.

I have not been able to find a better way than to extract interface and destination address information from incoming packets. The good news is that I figured out how to get that information on all of Linux, macOS and Windows. I'm sure I'll find other uses for this information, too 🙂

It furthermore changes the locator selection logic to ignore addresses that do not map to local interfaces if some addresses do map to local interfaces. This should obviate the need for having to set DontRoute in many instances where multiple interfaces were used simultaneously.

As a goody, it tries to work around Linux quirky behaviour with IPv4 multicast on the loopback interface: it (seemingly) always works, but the kernel doesn't advertise it. With this version, on startup Cyclone will quickly check whether it works so that one doesn't have to use weird options, make do with unicast or have root privileges to change the interface flags.

Copy link
Contributor

@dpotman dpotman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A few remarks and questions below.

for (const struct ddsi_networkpartition_address *a = np->uc_addresses; a; a = a->next)
{
for (int i = 0; i < gv->n_interfaces; i++)
if (memcmp (a->loc.address, gv->interfaces[i].loc.address, sizeof (a->loc.address)) == 0 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check for equal kind here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not likely to ever cause a problem if we don't, but I think we should :) See b336a27

GVTRACE ("some interfaces disallow spdp multicast: defaulting participant index to \"auto\"\n");
gv->config.participantIndex = DDSI_PARTICIPANT_INDEX_AUTO;
}
if (none_allow_spdp_mc && !gv->config.config_has_empty_peers_section)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trace log below includes "no peers configured": should the ! be removed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it a bit and decided that it would be better to have less implicit behaviour: 65a8843 adds an attribute to the "Peers" element, which I believe makes it both more flexible and clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this way it's clear what the setting does and what's the reason for adding localhost as peer.

@@ -1870,8 +1870,8 @@ static struct cfgelem discovery_cfgelems[] = {
"<ul><li><i>auto</i>: which will attempt to automatically determine an "
"available participant index "
"(see also Discovery/MaxAutoParticipantIndex), or</li>\n"
"<li>a non-negative integer less than 120, or</li>\n"
"<li><i>none</i>:, which causes it to use arbitrary port numbers for "
"<li>a non-negative integer, or</li>\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect that the limitation in src/core/ddsi/src/ddsi_config.c:1572 would also be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b336a27

{
struct ddsi_config_peer_listelem peer_local;
char local_addr[DDSI_LOCSTRLEN];
ddsi_locator_to_string_no_port (local_addr, sizeof (local_addr), &gv->interfaces[0].loc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the exact reason for adding 'self' to the peers list? And is interfaces[0]'s address always the right one to add?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes in 65a8843 take care of the first part.

The second question: yes, I believe interfaces[0] is ok. There is currently always at least one real network interface and by construction that one is in interfaces[0]. It think it is a reasonable assumption that it can always be used to send data to itself. If that's not true I think we might be looking at essentially another version of the (new) check to see if loopback multicast works on linux.

// on which the packet was received. If it doesn't work, we don't mind: it simply
// means there is slightly less information available for making sense of addresses
// or deciding whether multicast SPDP packet really is to be processed
(void) setsockopt_pktinfo (gv, sock, ipv6);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the performance impact be of setting this option? And maybe only set it on the socket used for SPDP, and not on the data socket?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a bit of checking using ddsperf -TOU on my mac, an ancient windows 10 box and an RPi3 in 64-bit mode with a Linux 5.1. This yields:

  • rpi: 74.24 → 71.57 — -3.6%
  • mac: 3429.39 → 3350.57 — -2.3%
  • windows: 1.34 → 1.36 — measurement noise

(Windows is based on the output of pub rather than sub and has a different scale.)

I'm tempted to say the benefits outweigh the cost, but I am considering adding an option to disable it altogether.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree, an option to disable it would be nice. The default can be 'enabled' because of the limited impact.

Copy link
Contributor

@dpotman dpotman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

This extends `ddsrt_recvmsg` to not only return the packet's source
address, but also its destination address and the interface on which it
was received.  If the operating system does not provide this
information, they can be set to "unknown".

The additional information is valuable because, at least on Linux, one
sometimes receives multicast packets from an interface on which one did
not join the multicast group for reasons I am yet to discover.  This
allows detecting such cases.  It also makes it possible to make a more
informed guess about which interface is the most suitable choice for
communicating with a node X, and it also seems likely that receiving a
multicast from X is a good indicator that sending a multicast to node X
will work.

This commit does not use the information other than for writing it in
the trace.  Current implementation is verified to work on macOS, Linux
and Windows for IPv4 and IPv6.

Signed-off-by: Erik Boasson <eb@ilities.com>
Signed-off-by: Erik Boasson <eb@ilities.com>
Before this commit, the decision which unicast locators to include was
to accept all distant locators if they were routable addresses and we
had a network interface that looked like it could handle them.  The
"DontRoute" configuration option would change this and made it ignore
all distant locators.

The first turns out to be problematic in practice because quite often
there are no suitable routes defined in the networking stack, but
defaulting to not allowing any routing is also problematic.

What this commit does is twofold:

* Firstly, it uses the new received packet information to associate the
  address with the network interface over which it was received (if
  possible) instead of the first seemingly suitable one;

* Secondly, it only uses distant addresses when it has been established
  that no nearby addresses are in the advertised set.  This heuristic
  probably better fits the use cases.

What it perhaps should also do is reject the address if we know what
interface it arrived on and that interface is not enabled in the
configuration.

Signed-off-by: Erik Boasson <eb@ilities.com>
For IPv4 on Linux, the common case seems to be that the loopback
interface is perfectly capable of handling multicast but the interface
usually don't indicate this.  This commit adds a run-time test by
sending a small multicast packet to itself in this specific case.

For IPv6 there is some other complication, the test says the network is
unreachable.  Perhaps correctly, perhaps there's a trick I haven't found
yet.  Either way, if errs on the safe side, so that's fine.

Signed-off-by: Erik Boasson <eb@ilities.com>
There is no simple hard limit (other than 2^32-2).

Signed-off-by: Erik Boasson <eb@ilities.com>
This adds an "allow_multicast" attribute to the network interface
configuration and uses the old General/AllowMulticast as the default
value.  This way, both backwards compatibility and easy global
configuration are supported.

This incidentally addresses a slew of bugs in multi-network
configurations, but it does make things rather more complicated.

Signed-off-by: Erik Boasson <eb@ilities.com>
Ignore SPDP messages that:

* Provably arrived over a network interface we didn't enable

* Were multicast over a network interface on which did not allow
  multicast discovery.

For an example why this is useful: on Linux, running two proceses one allowing multicast
over loopback but not Ethernet, and one allowing it over Ethernet but not over loopback,
both processes do receive the multicast message despite not having joined the multicast
group on the interface it is sent on.  With this commit, those SPDP messages are ignored
and the two processes do not discover each other.

The direct reason for this change is to make it possible to have discovery over Ethernet
only with unicast messages, while using relying on multicast over loopback.  This turns
out to be a rather important set-up, and is even the new default in ROS 2.

Signed-off-by: Erik Boasson <eb@ilities.com>
@eboasson eboasson merged commit a4937a9 into eclipse-cyclonedds:master Mar 20, 2024
2 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants