Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: Replace libnatpmp with built-in PCP+NATPMP implementation #30043

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

laanwj
Copy link
Member

@laanwj laanwj commented May 5, 2024

Continues #30005. Closes #17012..

This PR adds PCP (Port Control Protocol) from RFC6887. This adds, in addition to the existing IPv4 port mapping (which now uses PCP, with fallback to NAT-PMP), support for IPv6 pinholing-that is, opening a port on the firewall to make it reachable.

PCP, like NAT-PMP is a simple UDP-based protocol, and the implementation is self-contained, so this gets rid of lthe libnatpnp dependency without adding a new one. It should otherwise be a drop-in replacement. NAT-PMP fallback is implemented so this will not make router support worse.

For now it is disabled by default, though in the future (not in this PR) we could consider enable it by default to increase the number of connectable nodes without adding significant attack surface.

To test:

bitcoind -regtest -natpmp=1 -debug=net

(most of the changes in this PR are, ironically, removing the libnatpmp dependency and associated build system and build docs)

TODO

@DrahtBot
Copy link
Contributor

DrahtBot commented May 5, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK Sjors, theuni, vasild

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29852 ([WIP] build: remove need to test for endianness by fanquake)
  • #29790 ([DO NOT MERGE] cmake: Migrate CI scripts to CMake-based build system -- WIP by hebasto)
  • #29015 (kernel: Streamline util library by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 5, 2024

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/24606743714

@Sjors
Copy link
Member

Sjors commented May 6, 2024

Concept ACK

@theuni
Copy link
Member

theuni commented May 6, 2024

Whoa :)

(Concept ACK)

Copy link
Member

@Sjors Sjors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC6887 appendix A describes how a router that supports NAP-PMP but not PCP, will return UNSUPP_VERSION. A log message could encourage users to try -upnp instead (if not already enabled). Or upgrade their ancient router firmware :-)

Got distracted during review, will continue later. I'll look into how we can preserve a previously selected NatPMP checkbox in the GUI.

src/init.cpp Outdated Show resolved Hide resolved
@theuni
Copy link
Member

theuni commented May 6, 2024

Regarding the custom lib/self-impl discussion in #30005:
Taking a quick look at the implementation here, I think it's simple enough for us to maintain ourselves. If a nice canonical lib ever emerges we could always jump to it, as there are only a few basic functions and presumably we could probably switch them out close to 1:1.

That said, it is a little rough to review as

  • It needs to be compared to the letter of the spec, with the assumption that @laanwj is evil or has mis-implemented (I doubt that :)
  • It needs to be very defensive, though the attack surface seems quite minimal
  • It needs to be aware of real-world violators/benders/extenders of the spec (if any? I have no idea.)

But it seems worth the effort to me.

@laanwj
Copy link
Member Author

laanwj commented May 7, 2024

Rebased for #29984

@laanwj
Copy link
Member Author

laanwj commented May 7, 2024

That said, it is a little rough to review as
It needs to be compared to the letter of the spec, with the assumption that @laanwj is evil or has mis-implemented (I doubt that :)

If you're more comfortable comparing it against another implementation there's:

It needs to be aware of real-world violators/benders/extenders of the spec (if any? I have no idea.)

Miniupnpd's is, likely, the most common server implementation in the wild. It's been tested against that. i'm hoping people will test this on various routers and conditions, there will be inevitable edge cases to iron out.

If a nice canonical lib ever emerges we could always jump to it

Maybe, but at some point there's not much difference between implementing a simple request/reply protocol and using a library. At least the RFC is unambigiously documented, which can't be said of many FOSS ABI's. Also, @fanquake (and @gmaxwell in the past) has been hoping for a solution that doesn't introduce a library dependency.

src/util/pcp.cpp Outdated Show resolved Hide resolved
@luke-jr
Copy link
Member

luke-jr commented May 8, 2024

Would prefer this in two steps (add PCP, then remove NAT-PMP)

@laanwj
Copy link
Member Author

laanwj commented May 9, 2024

Would prefer this in two steps (add PCP, then remove NAT-PMP)

i'm not planning to do this, the build system commits are already set up this way, but doing it throughout would involve adding another setting in Qt just to remove it later. Same for adding a third mechanism in portmap.cpp. Agree with @Sjors that having a forest of different port mapping settings is confusing to the user.

Copy link
Member

@Sjors Sjors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on Ubuntu 24.04 I'm getting in bound via IPv4 and IPv6.

When a GUI user previously had NAT-PMP selected there would be a "natpmp": true field in settings.json.

We need to rename this to "tcp" in order for the box to remain checked.

Maybe @ryanofsky has an idea how to do this elegantly?

A little hack that works is to put the following at the end of ReadSettings() in settings.cpp:

    // Migrate settings:
    if (values.contains("natpmp")) {
        auto el = values.extract("natpmp");
        el.key() = "pcp";
        values.insert(std::move(el));
    }

A bit nicer would be to have a MigrateSettingsFile() that's called between ReadSettingsFile and WriteSettingsFile in init.cpp.

(not to be confused with OptionsModel::checkAndMigrate() which is called before settings.json is read)


Maybe you can split src/util/netif.h off into a separate PR, along with b06edec.

src/qt/optionsdialog.cpp Outdated Show resolved Hide resolved
src/qt/forms/optionsdialog.ui Outdated Show resolved Hide resolved
@Sjors
Copy link
Member

Sjors commented May 9, 2024

In terms of which IPv6 address to use, my understanding is this:

  1. In the olden days the IPv6 address contained part of the MAC address. This ensured uniqueness, but was bad for mobile device privacy; no matter where you went, part of your IP address was constant. Such addresses always (?) contain 0xfffe. See https://datatracker.ietf.org/doc/html/rfc7707
  2. A new standard was introduced that uses a hash of both the MAC address and the prefix, ensuring it's stable if you stay on the same network, but changes when you move. See https://datatracker.ietf.org/doc/html/rfc7217#page-19
  3. There's also temporary addresses, which are rotated every couple of hours. See https://datatracker.ietf.org/doc/html/rfc8981

If we were able to tell which one one is which, then I think we should pick only one, in order of preference: (2), (3), (1). This reflects our desire to actually get inbound connections, even after a shutdown, while at the same time not doxxing ourselves when on the move (mainly for laptops, perhaps mobile in the future).

We can easily detect type (1) by looking for 0xfffe at the right position (and then least prefer it).

I believe you can detect (2) by looking for IFA_F_STABLE_PRIVACY in flags of the inet6_ifaddr struct. It seems getifaddrs doesn't have access to that. Neither does /proc/net/if_inet6 since https://patchwork.ozlabs.org/project/netdev/patch/1386680189-7852-1-git-send-email-jiri@resnulli.us/

Afaik there's also no guarantee about the order of results.

So maybe we should just pick one at random rather than announce all three. This also seems orthogonal to PCP.

@laanwj
Copy link
Member Author

laanwj commented May 24, 2024

1cbbcba: squashed all fixups to clean up the list of commits, no other changes

@laanwj
Copy link
Member Author

laanwj commented May 25, 2024

The first two commits of #26812 would make it possible to test and fuzz how this code interacts with a router.

So ive been thinking about this, do we have a mockable way to do std::optional<Sock> socket(int domain, int type, int protocol)? i was thinking of passing in a SocketFactory (please don't kill me, this could be just a functor 😅 ). It's slightly nicer than passing in a pre-made Sock because it allows testing the "create a socket of the right family" logic.
(also ping @vasild)

@Sjors
Copy link
Member

Sjors commented May 27, 2024

Do you know what's the minimum FreeBSD version it can be compiled on? Let's bump the version bound to that.

Just tried on 14.0 with its default clang 16 and I get the same error. So we should either find a workaround or disable FreeBSD for this feature and a TODO comment. I don't know how popular this is as a desktop distro?

@laanwj
Copy link
Member Author

laanwj commented May 27, 2024

Just tried on 14.0 with its default clang 16 and I get the same error. So we should either find a workaround or disable FreeBSD for this feature and a TODO comment. I don't know how popular this is as a desktop distro?

i would feel bad disabling FreeBSD support after @vasild contributed the code for that, but if this gets close to merge and FreeBSD is still broken i'll remove it (added a TODO).
i expect #define typeof __typeof__ would go a long way to work around this error.

Copy link
Member

@Sjors Sjors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished the rest of my code review and tested 97ae5d4 again on Intel macOS 14.5, Ubuntu 24.04 and Windows.

I noticed that the renewal for IPv6 fails (at least on Ubuntu and macOS) with NO_RESOURCES. IPv4 renewal works fine. Ten minutes later it tries again and the mapping succeeds.

IIUC OPNsense uses miniupnpd 2.3.1, which is two years behind but at first glance I don't see any recent fixes related to renewals, nor any open issues.s

One possible explanation could be that the router doesn't implement the protocol correctly, in terms of when it allows renewal. If that's the case, it may be worth implementing the recommended retry intervals. We should also log the approximate time remaining so it's a bit easier to debug the router behavior.


Windows decided to quarantine bitocin-qt.exe (built with guix). I haven't seen that before in earlier tests, so I wonder if it's related to this change.

The timing suggests it happened 10 minutes after the ports were intially opened, so I'm guessing a renewal attempt triggered it, but I forgot to turn on more verbose logging.

@@ -148,7 +148,7 @@ enum
LOCAL_NONE, // unknown
LOCAL_IF, // address a local interface listens on
LOCAL_BIND, // address explicit bound to
LOCAL_MAPPED, // address reported by UPnP or NAT-PMP
LOCAL_MAPPED, // address reported by UPnP or PCP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bfde830 nit: UPnP, PCP or NAT-PMP

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, ok, really i'm just considering NAT-PMP to be PCPv0.

@laanwj
Copy link
Member Author

laanwj commented May 27, 2024

I noticed that the renewal for IPv6 fails (at least on Ubuntu and macOS) with NO_RESOURCES. IPv4 renewal works fine. Ten minutes later it tries again and the mapping succeeds.

Is this after re-launching bitcoind? Mind that restarting will roll a new mapping nonce, which means that from the point of your router it's a new application trying to get the port, which will prevent it from making the mapping until the old mapping expires. But that should go away after one cycle and not happen again the next one, as the nonce is stored and reused for renewals (as is specified in the RFC).

i doubt this has anything to do with specifiic intervals. If renewal is avoided while there is already a mapping at all then there will be holes in reachability, which is worse than some spurious errors.

Windows decided to quarantine bitocin-qt.exe (built with guix). I haven't seen that before in earlier tests, so I wonder if it's related to this change.

Possible, but that would be strange, given that we're not communicating to any different ports than before. Were you building with libnatpmp before? (oh, a guix build, yes it will have libnatpmp enabled).
Things like this make me so close to just closing this PR in frustration tbh.

@Sjors
Copy link
Member

Sjors commented May 27, 2024

Is this after re-launching bitcoind?

No, I hadn't run the node in a while. At startup it opens the port, only renewal fails. Leaving the node running, this seems to happen half the time, as you would expect: port is opened, renewal fails, mapping expires, so opening succeeds again, renewal fails, etc.

Were you building with libnatpmp before?

No, I had that disabled in the past. I'll see if I can trigger it again with more detailed logs. But you're right, it might be an existing issue.


I downloaded v27.0 and tried it on Windows, but NAT-PMP doesn't work at all there (natpmp: Port mapping failed). It tried turning the windows firewall off but that made no difference. It does on macOS ([mapport] natpmp: Port mapping successful. External address = ...).

@laanwj
Copy link
Member Author

laanwj commented May 27, 2024

No, I hadn't run the node in a while. At startup it opens the port, only renewal fails. Leaving the node running, this seems to happen half the time, as you would expect: port is opened, renewal fails, mapping expires, so opening succeeds again, renewal fails, etc.

This is the opposite from what's expected.

  • But it only happens for IPv6, not IPv4. i wonder if that's because IPv4 is the first mapping created? Maybe it doesn't like the same nonce being used for multiple different mappings existing at the same time?
    Maybe try putting the IPv6 block first and see if it reverses the issue? Or at least makes it succeed for the first IPv6 address.

  • Alternatively, maybe try generating a new nonce every call to PCPRequestPortMap. This violates the protocol but ... if it's implemented wrongly on the server this may trigger it to generate a fresh mapping. i dunno...

@Sjors
Copy link
Member

Sjors commented May 27, 2024

On macOS I recompiled with PORT_MAPPING_REANNOUNCE_PERIOD set to 5 minutes and IPv4 commented out. At startup it adds three mappings. After a few minutes at renewal it complains with NO_RESOURCES

@laanwj
Copy link
Member Author

laanwj commented May 27, 2024

After a few minutes at renewal it complains with NO_RESOURCES.

Does it complain for all three mappings?

@Sjors
Copy link
Member

Sjors commented May 27, 2024

Yes, for all three.

I then recompiled and set the renewal interval at 6/8 - 7/8. Started the node again (using a fresh port). Same pattern: three mappings succeeed, at renewal all three complain about NO_RESOURCES. (And then a few minutes later they succesfully get a mapping again).

@Sjors
Copy link
Member

Sjors commented May 27, 2024

Actually this might be an existing bug: miniupnp/miniupnp@7bd0877

I'll see if I can convince OPNsense to ship >= 2.3.6, tracking at opnsense/plugins#4003.

@laanwj
Copy link
Member Author

laanwj commented May 27, 2024

That does look like an issue that would affect renewal, but i'm not sure it would cause this specific issue, if i understand correctly it'd bump the renewal timestamp so far it would live forever. It wouln't make it complain about the port being used. Unless it's an old mapping that sticks around. But that wouldn'e explain why it only happens on renewal.

As for generating a new nonce every time for every mapping, this is what i meant:

diff --git a/src/mapport.cpp b/src/mapport.cpp
index e28ed52a58f393d463db91b596e10d5202856dbc..4084324baad97065f9bd290865a8bbefa34271ac 100644
--- a/src/mapport.cpp
+++ b/src/mapport.cpp
@@ -45,10 +45,6 @@ static constexpr auto PORT_MAPPING_RETRY_PERIOD{5min};

 static bool ProcessPCP()
 {
-    // The same nonce is used for all mappings, this is allowed by the spec, and simplifies keeping track of them.
-    PCPMappingNonce pcp_nonce;
-    GetRandBytes(pcp_nonce);
-
     bool ret = false;
     bool no_resources = false;
     const uint16_t private_port = GetListenPort();
@@ -86,6 +82,8 @@ static bool ProcessPCP()
             // Open a port mapping on whatever local address we have toward the gateway.
             struct in_addr inaddr_any;
             inaddr_any.s_addr = htonl(INADDR_ANY);
+            PCPMappingNonce pcp_nonce;
+            GetRandBytes(pcp_nonce);
             auto res = PCPRequestPortMap(pcp_nonce, *gateway4, CNetAddr(inaddr_any), private_port, requested_lifetime);
             MappingError* pcp_err = std::get_if<MappingError>(&res);
             if (pcp_err && *pcp_err == MappingError::UNSUPP_VERSION) {
@@ -105,6 +103,8 @@ static bool ProcessPCP()
             // Try to open pinholes for all routable local IPv6 addresses.
             for (const auto &addr: GetLocalAddresses()) {
                 if (!addr.IsRoutable() || !addr.IsIPv6()) continue;
+                PCPMappingNonce pcp_nonce;
+                GetRandBytes(pcp_nonce);
                 auto res = PCPRequestPortMap(pcp_nonce, *gateway6, addr, private_port, requested_lifetime);
                 handle_mapping(res);
             }

@Sjors
Copy link
Member

Sjors commented May 27, 2024

generating a new nonce every time for every mapping

That doesn't seem like a good idea. IIUC the nonce is used to recognise the requesting application, and to distinguish a renew request from some other app trying to use the same port.

@laanwj
Copy link
Member Author

laanwj commented May 27, 2024

That doesn't seem like a good idea. IIUC the nonce is used to recognise the requesting application, and to distinguish a renew request from some other app trying to use the same port.

It's not supposed to be a good idea, but i wonder if it works around the issue with your router, if they implemented the protocol wrongly. i honestly have no idea what could be the problem here i'm just guessing.

@Sjors
Copy link
Member

Sjors commented May 27, 2024

I thought perhaps the issue was that miniupnpd checks whether at least half the lease time went by and would refuse if not. So then if the lease time was too far in the future, it wouldn't allow renewal.

But looking through the source code it doesn't appear to care about that. There's a few places that can trigger a PCP_ERR_NO_RESOURCES reply, but so far those don't offer an obvious explanation.


The pinhole code is mostly seperate from port mappings. Depending on pcp_msg_info->is_fw it will call CreatePCPMap_FW for an IPv6 pinhole. My guess is this function doesn't recognise the renewal attempt as such, treats it as a new request and then fails. I need to figure out how to read the router log messages.


Figured out how to get verbose logging (opnsense/plugins#4004).

It does log updating pinhole to before the failure, so it does find the existing route and knows that it should update. The line after that log statement calls upnp_update_inboundpinhole and interprets any failure as PCP_ERR_NO_RESOURCES. There's only two ways that can fail:

  1. If #if defined(USE_PF) || defined(USE_NETFILTER) is false, because it'll return -42; /* not implemented */. But upnp_find_inboundpinhole has the same guard, so that can't be it.

  2. If update_pinhole fails, which can only fail if get_pinhole fails. That implies that the uid returned by find_pinhole is wrong.

I'm probably still missing something... I should test if deleting an IPv6 pinhole does work.


Deleting a pinhole succeeds. The log message is wrong (PCP: UDP port 8 mapping removed), which is an unrelated bug miniupnp/miniupnp#743.

I also tried opening only a single pinhole, to rule out some race condition. But that doesn't make a difference.


Anyway, I'm fairly sure it's not a bug in this PR.

src/util/pcp.cpp Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add IPv6 pinhole support using UPnP / NAT-PMP
9 participants