-
Notifications
You must be signed in to change notification settings - Fork 35.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: Replace libnatpmp with built-in PCP+NATPMP implementation #30043
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
e4da306
to
d4da95f
Compare
Concept ACK |
Whoa :) (Concept ACK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC6887 appendix A describes how a router that supports NAP-PMP but not PCP, will return UNSUPP_VERSION
. A log message could encourage users to try -upnp
instead (if not already enabled). Or upgrade their ancient router firmware :-)
Got distracted during review, will continue later. I'll look into how we can preserve a previously selected NatPMP checkbox in the GUI.
Regarding the custom lib/self-impl discussion in #30005: That said, it is a little rough to review as
But it seems worth the effort to me. |
Rebased for #29984 |
If you're more comfortable comparing it against another implementation there's:
Miniupnpd's is, likely, the most common server implementation in the wild. It's been tested against that. i'm hoping people will test this on various routers and conditions, there will be inevitable edge cases to iron out.
Maybe, but at some point there's not much difference between implementing a simple request/reply protocol and using a library. At least the RFC is unambigiously documented, which can't be said of many FOSS ABI's. Also, @fanquake (and @gmaxwell in the past) has been hoping for a solution that doesn't introduce a library dependency. |
Would prefer this in two steps (add PCP, then remove NAT-PMP) |
i'm not planning to do this, the build system commits are already set up this way, but doing it throughout would involve adding another setting in Qt just to remove it later. Same for adding a third mechanism in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on Ubuntu 24.04 I'm getting in bound via IPv4 and IPv6.
When a GUI user previously had NAT-PMP selected there would be a "natpmp": true
field in settings.json
.
We need to rename this to "tcp"
in order for the box to remain checked.
Maybe @ryanofsky has an idea how to do this elegantly?
A little hack that works is to put the following at the end of ReadSettings()
in settings.cpp
:
// Migrate settings:
if (values.contains("natpmp")) {
auto el = values.extract("natpmp");
el.key() = "pcp";
values.insert(std::move(el));
}
A bit nicer would be to have a MigrateSettingsFile()
that's called between ReadSettingsFile
and WriteSettingsFile
in init.cpp
.
(not to be confused with OptionsModel::checkAndMigrate()
which is called before settings.json is read)
Maybe you can split src/util/netif.h
off into a separate PR, along with b06edec.
In terms of which IPv6 address to use, my understanding is this:
If we were able to tell which one one is which, then I think we should pick only one, in order of preference: (2), (3), (1). This reflects our desire to actually get inbound connections, even after a shutdown, while at the same time not doxxing ourselves when on the move (mainly for laptops, perhaps mobile in the future). We can easily detect type (1) by looking for I believe you can detect (2) by looking for Afaik there's also no guarantee about the order of results. So maybe we should just pick one at random rather than announce all three. This also seems orthogonal to PCP. |
Add a RFC 6886 NATPMP and RFC 6887 Port Control Protocol (PCP) implementation, to replace libnatpmp.
Change option help, and remove conditionals.
1cbbcba: squashed all fixups to clean up the list of commits, no other changes |
So ive been thinking about this, do we have a mockable way to do |
Just tried on 14.0 with its default clang 16 and I get the same error. So we should either find a workaround or disable FreeBSD for this feature and a TODO comment. I don't know how popular this is as a desktop distro? |
i would feel bad disabling FreeBSD support after @vasild contributed the code for that, but if this gets close to merge and FreeBSD is still broken i'll remove it (added a TODO). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished the rest of my code review and tested 97ae5d4 again on Intel macOS 14.5, Ubuntu 24.04 and Windows.
I noticed that the renewal for IPv6 fails (at least on Ubuntu and macOS) with NO_RESOURCES
. IPv4 renewal works fine. Ten minutes later it tries again and the mapping succeeds.
IIUC OPNsense uses miniupnpd 2.3.1, which is two years behind but at first glance I don't see any recent fixes related to renewals, nor any open issues.s
One possible explanation could be that the router doesn't implement the protocol correctly, in terms of when it allows renewal. If that's the case, it may be worth implementing the recommended retry intervals. We should also log the approximate time remaining so it's a bit easier to debug the router behavior.
Windows decided to quarantine bitocin-qt.exe (built with guix). I haven't seen that before in earlier tests, so I wonder if it's related to this change.
The timing suggests it happened 10 minutes after the ports were intially opened, so I'm guessing a renewal attempt triggered it, but I forgot to turn on more verbose logging.
@@ -148,7 +148,7 @@ enum | |||
LOCAL_NONE, // unknown | |||
LOCAL_IF, // address a local interface listens on | |||
LOCAL_BIND, // address explicit bound to | |||
LOCAL_MAPPED, // address reported by UPnP or NAT-PMP | |||
LOCAL_MAPPED, // address reported by UPnP or PCP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bfde830 nit: UPnP, PCP or NAT-PMP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, ok, really i'm just considering NAT-PMP to be PCPv0.
Is this after re-launching bitcoind? Mind that restarting will roll a new mapping nonce, which means that from the point of your router it's a new application trying to get the port, which will prevent it from making the mapping until the old mapping expires. But that should go away after one cycle and not happen again the next one, as the nonce is stored and reused for renewals (as is specified in the RFC). i doubt this has anything to do with specifiic intervals. If renewal is avoided while there is already a mapping at all then there will be holes in reachability, which is worse than some spurious errors.
Possible, but that would be strange, given that we're not communicating to any different ports than before. Were you building with libnatpmp before? (oh, a guix build, yes it will have libnatpmp enabled). |
No, I hadn't run the node in a while. At startup it opens the port, only renewal fails. Leaving the node running, this seems to happen half the time, as you would expect: port is opened, renewal fails, mapping expires, so opening succeeds again, renewal fails, etc.
No, I had that disabled in the past. I'll see if I can trigger it again with more detailed logs. But you're right, it might be an existing issue. I downloaded v27.0 and tried it on Windows, but NAT-PMP doesn't work at all there ( |
This is the opposite from what's expected.
|
On macOS I recompiled with |
Does it complain for all three mappings? |
Yes, for all three. I then recompiled and set the renewal interval at 6/8 - 7/8. Started the node again (using a fresh port). Same pattern: three mappings succeeed, at renewal all three complain about NO_RESOURCES. (And then a few minutes later they succesfully get a mapping again). |
Actually this might be an existing bug: miniupnp/miniupnp@7bd0877 I'll see if I can convince OPNsense to ship >= 2.3.6, tracking at opnsense/plugins#4003. |
That does look like an issue that would affect renewal, but i'm not sure it would cause this specific issue, if i understand correctly it'd bump the renewal timestamp so far it would live forever. It wouln't make it complain about the port being used. Unless it's an old mapping that sticks around. But that wouldn'e explain why it only happens on renewal. As for generating a new nonce every time for every mapping, this is what i meant: diff --git a/src/mapport.cpp b/src/mapport.cpp
index e28ed52a58f393d463db91b596e10d5202856dbc..4084324baad97065f9bd290865a8bbefa34271ac 100644
--- a/src/mapport.cpp
+++ b/src/mapport.cpp
@@ -45,10 +45,6 @@ static constexpr auto PORT_MAPPING_RETRY_PERIOD{5min};
static bool ProcessPCP()
{
- // The same nonce is used for all mappings, this is allowed by the spec, and simplifies keeping track of them.
- PCPMappingNonce pcp_nonce;
- GetRandBytes(pcp_nonce);
-
bool ret = false;
bool no_resources = false;
const uint16_t private_port = GetListenPort();
@@ -86,6 +82,8 @@ static bool ProcessPCP()
// Open a port mapping on whatever local address we have toward the gateway.
struct in_addr inaddr_any;
inaddr_any.s_addr = htonl(INADDR_ANY);
+ PCPMappingNonce pcp_nonce;
+ GetRandBytes(pcp_nonce);
auto res = PCPRequestPortMap(pcp_nonce, *gateway4, CNetAddr(inaddr_any), private_port, requested_lifetime);
MappingError* pcp_err = std::get_if<MappingError>(&res);
if (pcp_err && *pcp_err == MappingError::UNSUPP_VERSION) {
@@ -105,6 +103,8 @@ static bool ProcessPCP()
// Try to open pinholes for all routable local IPv6 addresses.
for (const auto &addr: GetLocalAddresses()) {
if (!addr.IsRoutable() || !addr.IsIPv6()) continue;
+ PCPMappingNonce pcp_nonce;
+ GetRandBytes(pcp_nonce);
auto res = PCPRequestPortMap(pcp_nonce, *gateway6, addr, private_port, requested_lifetime);
handle_mapping(res);
} |
That doesn't seem like a good idea. IIUC the nonce is used to recognise the requesting application, and to distinguish a renew request from some other app trying to use the same port. |
It's not supposed to be a good idea, but i wonder if it works around the issue with your router, if they implemented the protocol wrongly. i honestly have no idea what could be the problem here i'm just guessing. |
I thought perhaps the issue was that miniupnpd checks whether at least half the lease time went by and would refuse if not. So then if the lease time was too far in the future, it wouldn't allow renewal. But looking through the source code it doesn't appear to care about that. There's a few places that can trigger a The pinhole code is mostly seperate from port mappings. Depending on Figured out how to get verbose logging (opnsense/plugins#4004). It does log
I'm probably still missing something... I should test if deleting an IPv6 pinhole does work. Deleting a pinhole succeeds. The log message is wrong ( I also tried opening only a single pinhole, to rule out some race condition. But that doesn't make a difference. Anyway, I'm fairly sure it's not a bug in this PR. |
Continues #30005. Closes #17012..
This PR adds PCP (Port Control Protocol) from RFC6887. This adds, in addition to the existing IPv4 port mapping (which now uses PCP, with fallback to NAT-PMP), support for IPv6 pinholing-that is, opening a port on the firewall to make it reachable.
PCP, like NAT-PMP is a simple UDP-based protocol, and the implementation is self-contained, so this gets rid of lthe libnatpnp dependency without adding a new one. It should otherwise be a drop-in replacement. NAT-PMP fallback is implemented so this will not make router support worse.
For now it is disabled by default, though in the future (not in this PR) we could consider enable it by default to increase the number of connectable nodes without adding significant attack surface.
To test:
(most of the changes in this PR are, ironically, removing the libnatpmp dependency and associated build system and build docs)
TODO