Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-proto_dist inet6_tcp support? #4132

Open
andywhite37 opened this issue Sep 28, 2023 · 2 comments
Open

-proto_dist inet6_tcp support? #4132

andywhite37 opened this issue Sep 28, 2023 · 2 comments

Comments

@andywhite37
Copy link

andywhite37 commented Sep 28, 2023

MongooseIM version: 6.1.0
Installed from: Docker
Erlang/OTP version: version packaged with MongooseIM 6.1.0

I posted a previous issue #4127 about ipv6 support with mongooseimctl, but I'm feeling like the problem runs deeper. I have the servers starting up and connecting to an RDBMS correctly, and I have been able to exchange messages with the server using an XMPP client (Adium). I've tried exercising the XMPP port (5222), WebSockets, GraphQL, and all of that seems to be working fine.

I have been struggling mightily to get MongooseIM clustering in an ipv6-based network in Kubernetes, both with mnesia and the new cets support. I'm unfortunately not an Erlang developer, so I've been doing a lot of reading. My research has led me to adding -proto_dist inet6_tcp in the vm.args/vm.dist.args, but I haven't had much luck with this.

This is what I currently have in vm.dist.args (I actually have these lines duplicated in vm.args too, just in case there are contexts that only use one or other of the files):

-proto_dist inet6_tcp
-kernel inet_dist_listen_min 9100
-kernel inet_dist_listen_max 9110

When I inspect the tcp listeners on the containers, I see epmd listening on port 4369 on both the ipv4 and ipv6 interfaces. However, when the listener is started on port 9100, it's only on the ipv4 interface, and not ipv6.

root@mongooseim-1:/# ss -tnlp | sort
LISTEN 0      1024                [::1]:5551         [::]:*    users:(("beam.smp",pid=29,fd=36))
LISTEN 0      1024                [::1]:8088         [::]:*    users:(("beam.smp",pid=29,fd=34))
LISTEN 0      1024                    *:5222            *:*    users:(("beam.smp",pid=29,fd=31))
LISTEN 0      1024                    *:5269            *:*    users:(("beam.smp",pid=29,fd=39))
LISTEN 0      1024                    *:5280            *:*    users:(("beam.smp",pid=29,fd=32))
LISTEN 0      1024                    *:5285            *:*    users:(("beam.smp",pid=29,fd=33))
LISTEN 0      1024                    *:5541            *:*    users:(("beam.smp",pid=29,fd=37))
LISTEN 0      1024                    *:5561            *:*    users:(("beam.smp",pid=29,fd=38))
LISTEN 0      1024                    *:8089            *:*    users:(("beam.smp",pid=29,fd=35))
LISTEN 0      1024   [::ffff:127.0.0.1]:8888            *:*    users:(("beam.smp",pid=29,fd=40))
LISTEN 0      128               0.0.0.0:9100      0.0.0.0:*    users:(("beam.smp",pid=29,fd=17))
LISTEN 0      4096              0.0.0.0:4369      0.0.0.0:*    users:(("epmd",pid=58,fd=3))     
LISTEN 0      4096                 [::]:4369         [::]:*    users:(("epmd",pid=58,fd=4))

When it's running this way, when I run mongooseimctl I get the nodedown error, I believe because my hostnames resolve to ipv6 addresses, so they want to connect to ports 9100-9110, but on the ipv6 address, rather than ipv4.

As an experiment, we tried running socat -dd TCP-LISTEN:9100,ipv6only,fork TCP4:127.0.0.1:9100 to set up an ipv6 listener to forward to the ipv4 address on the same port (we did this for all the ports 9100-9110), and that actually allows mongooseimctl to work and I can run commands, but it doesn't seem like this workaround works for mnesia and cets for clustering.

My suspicion is that -proto_dist inet6_tcp is not being respected somewhere (because whatever is starting the listener on 9100 is still just using ipv4), or some networking code is not using ipv6-compatible TCP options somewhere. I've looked through a lot of code in MongooseIM and cets for clues, but I don't have the background in erlang distribution/networking to know exactly where to look or what to look for.

  • I don't think my problem is DNS or hostname related - I'm able to ping6/telnet/nslookup each of the FQDNs from eachother.
  • The MongooseIM servers are up and running, and I'm able to exchange XMPP messages with the server using a standard XMPP client (Adium)
  • The problem is that the servers are not clustering
    • With mnesia, which I do mongooseimctl mnesia info, each node only lists itself in the running db nodes
    • I've also tried cets using a newer docker image, and when I do mongooseimctl cets systemInfo, both of the nodes show up in discoveredNodes, but each node shows the other in unavailableNodes, which I believe means they are not able to ping eachother. With my socat workaround in place, I can successfully ping each node using mongooseim ping
root@mongooseim-1:/# hostname -f
mongooseim-1.mongooseim.qwick-chat.svc.cluster.local
root@mongooseim-1:/# mongooseim ping mongooseim@mongooseim-0.mongooseim.qwick-chat.svc.cluster.local
pong
root@mongooseim-1:/# mongooseim ping mongooseim@mongooseim-1.mongooseim.qwick-chat.svc.cluster.local
pong

mongooseimctl cets systemInfo output:

root@mongooseim-1:/# mongooseimctl cets systemInfo
{
  "data" : {
    "cets" : {
      "systemInfo" : {
        "unavailableNodes" : [
          "mongooseim@mongooseim-0.mongooseim.qwick-chat.svc.cluster.local"
        ],
        "remoteUnknownTables" : [
          
        ],
        "remoteNodesWithoutDisco" : [
          
        ],
        "remoteNodesWithUnknownTables" : [
          
        ],
        "remoteNodesWithMissingTables" : [
          
        ],
        "remoteMissingTables" : [
          
        ],
        "joinedNodes" : [
          "mongooseim@mongooseim-1.mongooseim.qwick-chat.svc.cluster.local"
        ],
        "discoveryWorks" : true,
        "discoveredNodes" : [
          "mongooseim@mongooseim-0.mongooseim.qwick-chat.svc.cluster.local",
          "mongooseim@mongooseim-1.mongooseim.qwick-chat.svc.cluster.local"
        ],
        "conflictTables" : [
          
        ],
        "conflictNodes" : [
          
        ],
        "availableNodes" : [
          "mongooseim@mongooseim-1.mongooseim.qwick-chat.svc.cluster.local"
        ]
      }
    }
  }
}

Questions

  • Is -proto_dist inet6_tcp tested/known to work or not work with MongooseIM?
  • What else can I do to debug this?
@arcusfelis
Copy link
Contributor

arcusfelis commented Oct 1, 2023

Hi, node being unavailable (i.e. unavailableNodes) means it failed net_adm:ping.

i.e.

net_adm:ping('mongooseim@mongooseim-0.mongooseim.qwick-chat.svc.cluster.local').
pang

What should you check?

  • That it is possible to resolve the name mongooseim-0.mongooseim.qwick-chat.svc.cluster.local on mongooseim@mongooseim-1.mongooseim.qwick-chat.svc.cluster.local. Ping6 works, but erlang inet module could use its own logic :)
  • After that you would have to go to debug Erlang/OTP dist module :)

Oh, and there is resolver logic in erlang too:

inet:gethostbyname('google.com', inet6).
{ok,{hostent,"google.com",[],inet6,16,
             [{10752,5200,16411,2062,0,0,0,8206}]}}

To debug deeper we would need to figure out how to configure docker desktop for k8s with ipv6 only. Or the same but on Circle CI ;)

@chrzaszcz
Copy link
Member

chrzaszcz commented Oct 10, 2023

Hi @andywhite37. I can confirm that the inet6_tcp option is supported. You can check it with the following:

  • Clone the MongooseIM repo.
  • Edit rel/files/vm.dist.args, adding -proto_dist inet6_tcp.
  • Run cluster_commands_SUITE, which checks clustering with Mnesia. You can check CETS clustering as well (I checked it and it worked as well): ./tools/test-runner.sh --skip-small-tests --db redis pgsql --preset pgsql_mnesia --skip-cover --skip-stop-nodes -- cluster_commands (this command needs Docker to start postgres and redis containers). All tests in this suite passed for me locally. I could connect the nodes manually, and with TLS as well.

The difference to your setup seems to lie in the DNS resolution, as @arcusfelis suggested.


I think I'd ask you to do some debugging on your side. Run mongooseimctl debug on one of your nodes. Then, in the Erlang shell, try to do the following:

inet:gethostname().
net_adm:names().

Please provide the results. Could you also tell me what hostname returns (without -f) and what it resolves to? My first guess would be that it's not possible to reach epmd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants