Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

publication/subscription cannot connect when using Onload and Hardware Multicast Loopback #1107

Open
puiuvlad opened this issue Dec 13, 2020 · 9 comments

Comments

@puiuvlad
Copy link

Hi,

I would like to be able to publish on a multicast address such that processes both on the local machine and on remote machines are able to receive these messages.

I am using Solarflare cards and would like to use the Hardware Multicast Loopback feature available via Onload in order to be able to receive the multicast messages on the local machine.

My process contains both a publisher and a subscriber that use the same channel / stream. When I don't use Onload then the publication and subscription connect to each other. However, when I use Onload then they do not connect. The simple program below reproduces this issue.

What am I missing?

Thanks,
Vladimir

Here are the details:

When I do not use Onload the below program produces this output:

$ java -cp ... aaa.ClientConnectivityTest
Connecting...
Connected...

When I use Onload the program does not connect:

$ java -cp ... aaa.ClientConnectivityTest
oo:java[6884]: Using OpenOnload 7.1.0.265-ON-12635 [5]
oo:java[6884]: Copyright 2019-2020 Xilinx, 2006-2019 Solarflare Communications, 2002-2005 Level 5 Networks
Connecting...

Per the Solarflare documentation, in order to use the Hardware Multicast Loopback one needs to:
(1) set the firmware-variant to full-feature
(2) set the following environment variables: EF_MCAST_RECV_HW_LOOP=1 EF_MCAST_SEND=2 EF_CTPIO_SWITCH_BYPASS=0
Also, the loopback will not work for UDP datagrams above MSS (1473 for an MTU of 1500), but I don't think this is the case here.

I start a media driver in a shell where these variables are set then I start the test program in the same or diferent shell.

With the same settings, I am able to successfully run the Solarflare pingpong application as follows:

$ ./sfnt-pingpong
$ ./sfnt-pingpong --mcast 224.1.2.3 --mcastintf=sfc0 --maxmsg=1472 udp 10.10.10.161

The difference between these parameters and the aeron url is the name of the interface, in this case sfc0.

package aaa;

import io.aeron.Aeron;
import io.aeron.Publication;
import io.aeron.Subscription;

public class ClientConnectivityTest {

    private final Aeron _aeron = Aeron.connect(new Aeron.Context());

    private Publication _publication;
    private Subscription _subscription;

    private String _channel = "aeron:udp?endpoint=224.0.1.1:40123|interface=10.10.10.0/24";
    private int _stream = 10;

    public ClientConnectivityTest() {
        _publication = _aeron.addPublication(_channel, _stream);
        _subscription = _aeron.addSubscription(_channel, _stream);
    }

    public void waitUntilConnected() {
        System.out.println("Connecting...");

        while (!_publication.isConnected());
        while (!_subscription.isConnected());

        System.out.println("Connected...");
    }

    public static void main(String[] args) {
    	ClientConnectivityTest cct = new ClientConnectivityTest();
    	cct.waitUntilConnected();
    }
}
@tmontgomery
Copy link
Contributor

There could be a myriad of reasons. Loopback with Solarflare is not the same as not Solarflare. You would be best off turning on logging and see what is being sent/received by the driver.

@puiuvlad
Copy link
Author

You mean the Solarflare driver? Would you know how to turn logging on?

@tmontgomery
Copy link
Contributor

@puiuvlad
Copy link
Author

Here is the debug log for the case without Onload:

[1409.425039275] log started 2020-12-15 01:05:57.039+0000
[1457.6808504] DRIVER: CMD_IN_ADD_PUBLICATION [82/82]: 10 [0:1] aeron:udp?endpoint=224.0.1.1:40123|interface=10.10.10.0/24
[1457.68860392] DRIVER: SEND_CHANNEL_CREATION [98/98]: UdpChannel - interface: sfc0, localData: /10.10.10.161:0, remoteData: /224.0.1.1:40123, ttl: 0
[1457.713046341] DRIVER: CMD_OUT_COUNTER_READY [12/12]: 0 33
[1457.713119265] DRIVER: CMD_OUT_PUBLICATION_READY [80/80]: -874586031:10 29 26 [1 1] /dev/shm/aeron-puiu/publications/1.logbuffer
[1457.713512329] DRIVER: FRAME_OUT [52/52]: 224.0.1.1:40123 SETUP 00000000 len 40 -874586031:10:-1006307374 -1006307374 @0 16777216 MTU 1408 TTL 0
[1457.718966624] DRIVER: CMD_IN_ADD_SUBSCRIPTION [90/90]: 10 [-1][0:2] aeron:udp?endpoint=224.0.1.1:40123|interface=10.10.10.0/24
[1457.721284926] DRIVER: RECEIVE_CHANNEL_CREATION [98/98]: UdpChannel - interface: sfc0, localData: /10.10.10.161:0, remoteData: /224.0.1.1:40123, ttl: 0
[1457.722820131] DRIVER: CMD_OUT_SUBSCRIPTION_READY [12/12]: 2 34
[1457.814634465] DRIVER: FRAME_OUT [52/52]: 224.0.1.1:40123 SETUP 00000000 len 40 -874586031:10:-1006307374 -1006307374 @0 16777216 MTU 1408 TTL 0
[1457.814854007] DRIVER: FRAME_IN [52/52]: 10.10.10.161:40294 SETUP 00000000 len 40 -874586031:10:-1006307374 -1006307374 @0 16777216 MTU 1408 TTL 0
[1457.837049957] DRIVER: CMD_OUT_AVAILABLE_IMAGE [94/94]: -874586031:10 [36:2] [3] /dev/shm/aeron-puiu/images/3.logbuffer 10.10.10.161:40294
[1457.838053338] DRIVER: FRAME_OUT [48/48]: 224.0.1.2:40123 SM 00000000 len 36 -874586031:10:-1006307374 @0 131072 -5501331629031116081
[1457.839027384] DRIVER: FRAME_IN [48/48]: 10.10.10.161:40123 SM 00000000 len 36 -874586031:10:-1006307374 @0 131072 -5501331629031116081
[1457.914815898] DRIVER: FRAME_OUT [52/52]: 224.0.1.1:40123 SETUP 00000000 len 40 -874586031:10:-1006307374 -1006307374 @0 16777216 MTU 1408 TTL 0
[1457.915628237] DRIVER: FRAME_IN [52/52]: 10.10.10.161:40294 SETUP 00000000 len 40 -874586031:10:-1006307374 -1006307374 @0 16777216 MTU 1408 TTL 0

and here is the debug log for the case with Onload:

[1684.666193694] log started 2020-12-15 01:10:32.280+0000
[1691.080579715] DRIVER: CMD_IN_ADD_PUBLICATION [82/82]: 10 [0:1] aeron:udp?endpoint=224.0.1.1:40123|interface=10.10.10.0/24
[1691.088197466] DRIVER: SEND_CHANNEL_CREATION [98/98]: UdpChannel - interface: sfc0, localData: /10.10.10.161:0, remoteData: /224.0.1.1:40123, ttl: 0
[1691.11278781] DRIVER: CMD_OUT_COUNTER_READY [12/12]: 0 33
[1691.112859512] DRIVER: CMD_OUT_PUBLICATION_READY [80/80]: -527666993:10 29 26 [1 1] /dev/shm/aeron-puiu/publications/1.logbuffer
[1691.112992857] DRIVER: FRAME_OUT [52/52]: 224.0.1.1:40123 SETUP 00000000 len 40 -527666993:10:638473829 638473829 @0 16777216 MTU 1408 TTL 0
[1691.118719067] DRIVER: CMD_IN_ADD_SUBSCRIPTION [90/90]: 10 [-1][0:2] aeron:udp?endpoint=224.0.1.1:40123|interface=10.10.10.0/24
[1691.121199006] DRIVER: RECEIVE_CHANNEL_CREATION [98/98]: UdpChannel - interface: sfc0, localData: /10.10.10.161:0, remoteData: /224.0.1.1:40123, ttl: 0
[1691.122711512] DRIVER: CMD_OUT_SUBSCRIPTION_READY [12/12]: 2 34
[1691.213188117] DRIVER: FRAME_OUT [52/52]: 224.0.1.1:40123 SETUP 00000000 len 40 -527666993:10:638473829 638473829 @0 16777216 MTU 1408 TTL 0
[1691.313417096] DRIVER: FRAME_OUT [52/52]: 224.0.1.1:40123 SETUP 00000000 len 40 -527666993:10:638473829 638473829 @0 16777216 MTU 1408 TTL 0
[1691.41465506] DRIVER: FRAME_OUT [52/52]: 224.0.1.1:40123 SETUP 00000000 len 40 -527666993:10:638473829 638473829 @0 16777216 MTU 1408 TTL 0

It seems that the driver is sending some frames but is not getting anything back.

@puiuvlad
Copy link
Author

From the Onload User Guide:

Setting the socket option MULTICAST_TTL=0 will disable the sending of traffic on the normal network path and prevent traffic being looped back. The value of the socket option IP_MULTICAST_LOOP has no effect on Hardware Multicast Loopback

Are you by any chance setting the MULTICAST_TTL multicast socket option to 0? In the log TTL seems to be 0. Maybe that's the cause?

@puiuvlad
Copy link
Author

Nope, setting ttl to non zero does not fix the issue:

[4193.42240922] log started 2020-12-15 01:52:21.036+0000
[4212.534525264] DRIVER: CMD_IN_ADD_PUBLICATION [89/89]: 10 [0:1] aeron:udp?endpoint=224.0.1.1:40123|interface=10.10.10.0/24|ttl=64
[4212.54233816] DRIVER: SEND_CHANNEL_CREATION [99/99]: UdpChannel - interface: sfc0, localData: /10.10.10.161:0, remoteData: /224.0.1.1:40123, ttl: 64
[4212.566912889] DRIVER: CMD_OUT_COUNTER_READY [12/12]: 0 33
[4212.566985339] DRIVER: CMD_OUT_PUBLICATION_READY [80/80]: 1920377135:10 29 26 [1 1] /dev/shm/aeron-puiu/publications/1.logbuffer
[4212.567148991] DRIVER: FRAME_OUT [52/52]: 224.0.1.1:40123 SETUP 00000000 len 40 1920377135:10:-1738037192 -1738037192 @0 16777216 MTU 1408 TTL 64
[4212.572846497] DRIVER: CMD_IN_ADD_SUBSCRIPTION [97/97]: 10 [-1][0:2] aeron:udp?endpoint=224.0.1.1:40123|interface=10.10.10.0/24|ttl=64
[4212.575277372] DRIVER: RECEIVE_CHANNEL_CREATION [99/99]: UdpChannel - interface: sfc0, localData: /10.10.10.161:0, remoteData: /224.0.1.1:40123, ttl: 64
[4212.576798084] DRIVER: CMD_OUT_SUBSCRIPTION_READY [12/12]: 2 34
[4212.668405568] DRIVER: FRAME_OUT [52/52]: 224.0.1.1:40123 SETUP 00000000 len 40 1920377135:10:-1738037192 -1738037192 @0 16777216 MTU 1408 TTL 64
[4212.768601862] DRIVER: FRAME_OUT [52/52]: 224.0.1.1:40123 SETUP 00000000 len 40 1920377135:10:-1738037192 -1738037192 @0 16777216 MTU 1408 TTL 64
[4212.868783043] DRIVER: FRAME_OUT [52/52]: 224.0.1.1:40123 SETUP 00000000 len 40 1920377135:10:-1738037192 -1738037192 @0 16777216 MTU 1408 TTL 64

@tmontgomery
Copy link
Contributor

You tried 224.1.2.3 with the Solarflare ping/pong and 224.0.1.1 with Aeron. Have you tried 224.1.2.3 with Aeron?

@puiuvlad
Copy link
Author

Yes, just tried...

[2490.267811401] log started 2020-12-15 02:58:14.362+0000
[2494.968209582] DRIVER: CMD_IN_ADD_PUBLICATION [89/89]: 10 [0:1] aeron:udp?endpoint=224.1.2.3:40123|interface=10.10.10.0/24|ttl=32
[2494.975821439] DRIVER: SEND_CHANNEL_CREATION [99/99]: UdpChannel - interface: sfc0, localData: /10.10.10.161:0, remoteData: /224.1.2.3:40123, ttl: 32
[2495.000619317] DRIVER: CMD_OUT_COUNTER_READY [12/12]: 0 33
[2495.000710516] DRIVER: CMD_OUT_PUBLICATION_READY [80/80]: 457147635:10 29 26 [1 1] /dev/shm/aeron-puiu/publications/1.logbuffer
[2495.000788522] DRIVER: FRAME_OUT [52/52]: 224.1.2.3:40123 SETUP 00000000 len 40 457147635:10:-252621430 -252621430 @0 16777216 MTU 1408 TTL 32
[2495.005518315] DRIVER: CMD_IN_ADD_SUBSCRIPTION [97/97]: 10 [-1][0:2] aeron:udp?endpoint=224.1.2.3:40123|interface=10.10.10.0/24|ttl=32
[2495.007992279] DRIVER: RECEIVE_CHANNEL_CREATION [99/99]: UdpChannel - interface: sfc0, localData: /10.10.10.161:0, remoteData: /224.1.2.3:40123, ttl: 32
[2495.009524315] DRIVER: CMD_OUT_SUBSCRIPTION_READY [12/12]: 2 34
[2495.068254386] DRIVER: FRAME_OUT [52/52]: 224.1.2.3:40123 SETUP 00000000 len 40 457147635:10:-252621430 -252621430 @0 16777216 MTU 1408 TTL 32
[2495.168443271] DRIVER: FRAME_OUT [52/52]: 224.1.2.3:40123 SETUP 00000000 len 40 457147635:10:-252621430 -252621430 @0 16777216 MTU 1408 TTL 32
[2495.268577219] DRIVER: FRAME_OUT [52/52]: 224.1.2.3:40123 SETUP 00000000 len 40 457147635:10:-252621430 -252621430 @0 16777216 MTU 1408 TTL 32

@puiuvlad
Copy link
Author

puiuvlad commented Dec 15, 2020

The source code for the sfnt-pingpong.c is here:

https://github.com/lilinj2000/sfnettest-1.5.0/blob/master/src/sfnt-pingpong.c

The method of interest is udp_bind_sock, specifically at lines 634-648.

They are setting the socket option IP_MULTICAST_LOOP although the documentation above says it has no effect on hardware multiacst loopback...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants