Dial queue timeout for multiple multiaddresses incorrectly used for each connection separately #2368

akim-bow · 2024-01-19T18:19:04Z

Version:
libp2p@1.2.0

Severity:

Medium

Description:

If remote peer proposes multiple multiaddresses, e.g. [/dns/nox-1, /ip4/10.50.10.10, /ip4/127.0.0.1] (only the last one is reachable by default), libp2p will try to connect to them one by one. As you can see in the source code, the shared signal is created and used in every attempt to dial multiaddresses. Hence, the first address could take up all the time until the signal aborts and other addresses would be dropped without an attempt to connect to them.

I'm proposing to use 2 separate signals - first one for batch of multiaddresses, the second one for each multiaddress separately. It will solve the case when a peer provides multiple adresses and only some of them are actually valid. If the description isn't clear enough i can try to build simple reproduction example.

The text was updated successfully, but these errors were encountered:

akim-bow · 2024-02-08T11:36:46Z

@achingbrain can you please look into it?

zeroxbt · 2024-03-05T01:40:19Z

@achingbrain could you take a look at this ? Most auto dials are failing because of it

achingbrain · 2024-03-06T14:02:46Z

@akim-bow what kind of addresses are you having problems with?

The example you gave was [/dns/nox-1, /ip4/10.50.10.10, /ip4/127.0.0.1] - DNS addresses take time to resolve, so that can be bad, fair enough, but the other two are private local addresses so assuming TCP, they should be fast to fail?

If you know certain addresses are going to be unreliable or slow, you can also pass an addressSorter as part of the connectionManager config to ensure they are dialled last, or a connectionGater to ensure they are not dialled at all.

zeroxbt · 2024-03-06T22:57:51Z

@achingbrain I'm having the same issue when running nodes in local environment.

When dialing a peer, these are the addresses found in the peer store:

[/ip4/10.2.0.2/tcp/9106, /ip4/127.0.0.1/tcp/9106, /ip4/172.20.5.94/tcp/9106]

The first address is not reachable, while the other two are. The dialer fails after dialTimeout milliseconds because of the first address, without ever attempting to dial the other two. All auto dials are also failing for the same reason.

achingbrain · 2024-03-07T15:02:40Z

My concern with the proposed solution is that having separate timeouts for individual addresses could lead to very long dial times.

I wonder if we could be smarter with the dialling, for example if the IP address of the target multiaddr is a class A private network address, deprioritise or entirely skip dialling it unless the current node also has a class A private network address in the same logical network?

zeroxbt · 2024-03-07T15:23:58Z

Although sorting and filtering addresses is a good option, as a user I would still expect the dial function to attempt dialing all provided addresses.
Why not give users the option to choose a maxParallelDialsPerPeer parameter with a default value of 1 ?

akim-bow added the need/triage Needs initial labeling and prioritization label Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dial queue timeout for multiple multiaddresses incorrectly used for each connection separately #2368

Dial queue timeout for multiple multiaddresses incorrectly used for each connection separately #2368

akim-bow commented Jan 19, 2024 •

edited

akim-bow commented Feb 8, 2024

zeroxbt commented Mar 5, 2024

achingbrain commented Mar 6, 2024 •

edited

zeroxbt commented Mar 6, 2024

achingbrain commented Mar 7, 2024

zeroxbt commented Mar 7, 2024 •

edited

Dial queue timeout for multiple multiaddresses incorrectly used for each connection separately #2368

Dial queue timeout for multiple multiaddresses incorrectly used for each connection separately #2368

Comments

akim-bow commented Jan 19, 2024 • edited

Severity:

Description:

akim-bow commented Feb 8, 2024

zeroxbt commented Mar 5, 2024

achingbrain commented Mar 6, 2024 • edited

zeroxbt commented Mar 6, 2024

achingbrain commented Mar 7, 2024

zeroxbt commented Mar 7, 2024 • edited

akim-bow commented Jan 19, 2024 •

edited

achingbrain commented Mar 6, 2024 •

edited

zeroxbt commented Mar 7, 2024 •

edited