Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rabbit_peer_discovery: Pass inetrc config file to temporary hidden node #10904

Merged
merged 5 commits into from Apr 16, 2024

Conversation

dumbbell
Copy link
Member

@dumbbell dumbbell commented Apr 2, 2024

Why

As shown in #10728, in an IPv6-only environment, kernel name resolution must be configured through an inetrc file.

The temporary hidden node must be configured exactly like the main node (and all nodes in the cluster in fact) to allow communication. Thus we must pass the same inetrc file to that temporary hidden node. This wasn’t the case before this patch.

How

We query the main node’s kernel to see if there is any inetrc file set and we use the same on the temporary hidden node’s command line.

While here, extract the handling of the proto_dist module from the TLS code. This parameter may be used outside of TLS like this IPv6-only environment.

V2: Accept inetrc filenames as atoms, not only lists. kernel seems to accept them. This makes a better user experience for users used to Bourne shell quoting not knowing that single quotes have a special meaning in Erlang.

Note: This is the same pull request as #10759 which was merged and backported when it was still a draft.

References #10840.
Fixes #10728.

@mergify mergify bot added the bazel label Apr 2, 2024
@dumbbell dumbbell self-assigned this Apr 2, 2024
@dumbbell dumbbell force-pushed the peer-disc-temporary-nodes-and-inetrc branch 2 times, most recently from bcf0da7 to 99fa593 Compare April 4, 2024 12:38
ct:pal("Peers: ~p", [Peers]),
Peers.

determine_hostname(PeerOptions) ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than modifying the system /etc/hosts file, would it possibly be easier to use the inetrc file itself?

https://www.erlang.org/doc/apps/erts/inet_cfg#user-configuration-example

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would interfere with the testscases actually testing the inetrc file option.

@mkuratczyk
Copy link
Contributor

I've played with this PR using kind in IPv6-only configuration and an example from our operator. Unfortunately it didn't form a cluster - all nodes started independently. Below you can find logs from:

  • 3.12.13 (works)
  • 3.13.1 (fails to start)
  • this PR (starts but not clustered)

3.12.13.log

3.13.1.log

pr.log

@dumbbell dumbbell force-pushed the peer-disc-temporary-nodes-and-inetrc branch from 99fa593 to f65b9b4 Compare April 8, 2024 12:53
@dumbbell
Copy link
Member Author

dumbbell commented Apr 8, 2024

I've played with this PR using kind in IPv6-only configuration and an example from our operator. Unfortunately it didn't form a cluster - all nodes started independently. (...)

There is a clause added in the first commit that addresses the following message from node 2:

pr10904-server-2 rabbitmq 2024-04-05 13:29:15.193565+00:00 [info] <0.254.0> Peer discovery: node 'rabbit@pr10904-server-2.pr10904-nodes.default' selected for auto-clustering but its DB layer is not ready; waiting before retrying...

This makes sure that a node that selects itself can continue to boot (otherwise we have a dead lock).

Are you sure you used the latest commit from this branch?

Edit: It was probably related to the fact the CI fails to build docker images, and thus the latest one for this branch is old.

@dumbbell dumbbell force-pushed the peer-disc-temporary-nodes-and-inetrc branch from f65b9b4 to 4e20ccb Compare April 8, 2024 16:06
@mkuratczyk
Copy link
Contributor

Indeed, I was using the wrong image. I've tested the latest and it works well for me

@dumbbell dumbbell force-pushed the peer-disc-temporary-nodes-and-inetrc branch 5 times, most recently from d153b48 to 9e95233 Compare April 15, 2024 15:19
@dumbbell dumbbell marked this pull request as ready for review April 15, 2024 15:34
… "DB ready" state

[Why]
When peer discovery runs on initial boot, the database layer is not
ready when peer discovery is executed. Thus if the node selects itself
as the node to "join", it shouldn't pay attention to the database
readyness otherwise it would wait forever.
[Why]
As shown in #10728, in an IPv6-only environment, `kernel` name
resolution must be configured through an inetrc file.

The temporary hidden node must be configured exactly like the main node
(and all nodes in the cluster in fact) to allow communication. Thus we
must pass the same inetrc file to that temporary hidden node. This
wasn’t the case before this patch.

[How]
We query the main node’s kernel to see if there is any inetrc file set
and we use the same on the temporary hidden node’s command line.

While here, extract the handling of the `proto_dist` module from the TLS
code. This parameter may be used outside of TLS like this IPv6-only
environment.

V2: Accept `inetrc` filenames as atoms, not only lists. `kernel` seems
    to accept them. This makes a better user experience for users used
    to Bourne shell quoting not knowing that single quotes have a
    special meaning in Erlang.

Fixes #10728.
…idden node

[Why]
We want to make sure that the queried remote node doesn't know about the
querying node. That's why we use a temporary hidden node in the first
place.

In fact, it is possible to start the temporary hidden node and
communicate with it using an alternative connection instead of the
regular Erlang distribution. This further ensures that the two RabbitMQ
nodes don't connect to each other while properties are queried.

[How]
We use the `standard_io` alternative connection. We need to use
`peer:call/4` instead of `erpc:call/4` to run code on the temporary
hidden node.

While here, we add assertions to verify that the three node didn't form
a full mesh network at the end of the query code.
@dumbbell dumbbell force-pushed the peer-disc-temporary-nodes-and-inetrc branch from 9e95233 to dbf9dfb Compare April 16, 2024 08:19
@dumbbell dumbbell merged commit ce9c33a into main Apr 16, 2024
23 checks passed
@dumbbell dumbbell deleted the peer-disc-temporary-nodes-and-inetrc branch April 16, 2024 13:00
dumbbell added a commit that referenced this pull request Apr 25, 2024
rabbit_peer_discovery: Pass inetrc config file to temporary hidden node (backport #10904)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RabbitMQ 3.13.0 nodes with peer discovery enabled fails to start in an IPv6-only environment
3 participants